Central Asiatic Scythians, Parthians & Turkics play a greater role in ethnogenesis of Kurds than in the demography of other West Asian ethnic groups
Abstract
Genetic analysis conducted by prominent scientists along with our own independent detailed genome analysis formally, using high density genotyped datasets indicates that Central Asian Iron-Age Indo-Iranic Scythians and Sarmatians played a significant role in the ethnogenesis of present day Kurds and a few other Iranic populations. This coincides with the introduction of Indo-Iranian languages into the Iranian plateau during the time of Medes and Parthians 2000 to 2700 years ago, and prior to the arrival of Turkic tribes in the Iranian plateau.
Additionally since Kurds and Turkmen have co-inhabited various regions in northern Iraq and north-east Iran ( Fig 1-3) for hundreds of years, admixture between these two ethnic groups in more historical times has likely added a layer to this genetic relatedness between some present day Kurds and Central Asian Turkics to the exclusion of other West Asian populations.
Using whole genome sequenced (WGS) ancient and modern DNA sanples, along with qpAdm and qpWave formal analysis [7] our passing admixture models for Kurds, subject to a set of very stringent protocols outlined in the last section, show that present day Kurds derive approximately 58% to 71% from a local West Asian pre-Indo-Iranian pastoralist stock, supplemented with 29% to 42% Scythian/Sarmatian/Parthian related Indo-Iranian admixture from Central Asia (fig 30-31). These admixture events around 2000-2700 years ago between the local pre-Indo-Iranic ancient sheep herders of the Zagros mountains and Iranian plateau, and the invading Scythian/Sarmatian/Parthians/Medes were likely responsible for the proliferation of Indo-Iranian languages in the region.
Herein we summarize the findings of those papers [1,2,3,4,5] and present it along with our own detailed genetic analysis using formal statistics [7].
Introduction
Kurds often possess facial morphological features associated with Turkic Central Asian populations (fig 4-9), more so than most of their neighboring populations such as Persians, Arabs, Armenians and Georgians.
The confounding factor is how much of this East Asian geneflow is attributable to the Indo-Europeanization of the Kurdistan region by various Iron Age Central Asian populations such as Scythians, Sarmatians and Parthians, and how much is attributable to more recent hybridization with Turkmen.
Findings from the following papers will be presented, along with our own formal analysis which corroborates their findings:
- In “Ancestry and demography and descendants of Iron Age nomads of the Eurasian Steppe“, Unterländer, M., Palstra, F., Lazaridis, I. et al. the authors show that unlike most other populations in West Asia, Kurds derive from both Eastern and Western Scythians.
- In “Kurds HLA Genes: Its Implications in Transplantation and Pharmacogenomics“, Ali Amirzargar1,§, Diego Rey2,§, Ester Muñiz et al, Medical School, Tehran University of Medical Sciences, Tehran, Iran, 2015, the authors show that from various Iranian and worldwide populations Kurds are genetically closest to Iranian Gorgan Turkmen based on HLA-DRB1 haplotypes.
- In “Phylogeography, genetic diversity and demographic history of the Iranian Kurdish groups based on mtDNA sequences”. Zarei F, Rajabi-Maham H. J Genet. 2016 Dec;95(4):767-776. doi: 10.1007/s12041-016-0692-4. PMID: 27994175 from all European and Asian populations Iraqi Kurmanji Kurds have the lowest FST mt DNA distance to Tatars and from all west Asian and European populations Iraqi Kurmanji Kurds have the lowest mt DNA FST distance to Turkmen.
- In “A genetic landscape reshaped by recent events: Y-chromosomal insights into central Asia”. Zerjal T, Wells RS, Yuldasheva N, Ruzibakiev R, Tyler-Smith C. Am J Hum Genet. 2002;71(3):466-482. doi:10.1086/342096, the authors show that based on Y-DNA Central Asians cluster closer to Kurds than they do to other West Asians.
Our own analysis using qpWave program [7] in the Admixtools suite from Reich Lab is consistent with the above findings.
Facial morphology of Kurds often overlaps with Central Asians
Discussion
Kurds descend from Scythians
Unlike other populations in west Asia, the authors in “Ancestry and demography and descendants of Iron Age nomads of the Eurasian Steppe“[1] show that Kurds have a non-zero posterior probability of descent from Eastern Scythians (Fig 7-11) in addition to a very high probability of descent from Western Scythians [1] , where Western Scythians are represented by Sarmatians from western Kazakhstan and the Aral Sea area and Eastern Scythians are represented by samples from the AldyBel, Pazyryk, and Zevakino-Chilikta cultures from eastern Kazakhstan.
We independently verified these results for Kurds using qpWave to check for rank=0 cladliness of various densely genotyped contemporary populations from west and central Asia against higher quality Scythian and other Iron Age samples from Mongolia.
We quantify the percentage present day Iraqi Kurds derive from Iron-Age populations related to those Indo-Iranic Scythians in the final paragraphs below.
The principle behind using qpWave in this mode is if 2 source population (pleft) are symmetrically related to all reference populations (pright) then the source populations must share the same clade.
QpWave [7] computes a matrix of f-statistics of all possible pairs of populations in the ‘left’ and ‘right’ sets, of the form f4(Lefti, Leftj; Rightk, Rightl).
Fig 12 – Descent probability from western and eastern scythians by population [1]
Fig 13 – Posterior probability of descent from Western Scythians (dark red) [1]
Comparison of Kurd HLA genes with other Iranic and worldwide populations
Most of the SNPs genotyped by commercial ancestry companies involve the non-coding intergenic regions of our genomes. However, mutations in these regions tend to accumulate at a greater rate than in protein coding regions that affect organism function and gene expression. This is likely due to purifying or negative selection of genes that are deleterious. Thus it is advantageous to use these coding regions for studying deeper phylogenetic relationships among various ethnic groups.
Human leukocyte antigens (HLA genes) are a family of coding genes located on chromosome 6 and are the human version of the major histocompatibility complex (MHC). They consist of around 3 million bases and are very polymorphic in humans. Their main function is immune system regulation, and they are also associated with transplanted organ rejection.
A study headed by the Tehran University Molecular Immunology Research Center to map the HLA genes of Kurds [5] (Kurdistan province, Iran) and other Iranian populations and compare them with worldwide populations to determine which ethnic groups would be suitable donors for organ transplants into Kurdish subjects found that from the Iranian and world populations studied Iranian Gorgan Turkmen had the lowest genetic distances to Kurds (Fig 15-16).
Surprisingly, Persians are much more distant from Kurds based on HLA genes. Whereas Kurds and Persians share substantial recent genetic drift based on clustering programs, on a deeper genetic level, such as suggested by HLA genes and haplogroups, Kurds and Persians are quite divergent indicating different origins for these two ethnic groups.
Surprisingly, Kurds are genetically significantly closer to Iranian Gorgan Turkmen, Russians and Caucasians than they are to Persians. This is consistent with Kurds having one of the highest frequencies of Y-DNA R1a in West Asia and indicates greater Iron-Age steppe ancestry for Kurds than Persians and other West Asians.
This is also consistent with Turkmen and Central Asians having lower FST distances to Kurds than to Persians and other West Asians based on research conducted on Y-DNA in [4], and Mt-DNA in [2].
Iraqi Kurds genetically closest West Asian population to Turkmen and Tatars based on Mt-DNA
Since Mt-DNA is maternally related, Mt-DNA can be used to trace maternal lineage going back far back in time. A study of various Iranic ethnic groups [2] showed that Iraqi Kurmanji Kurds are the closest West Asian populatin to Central Asian Turkmen and Tatars (Fig 19). The sampling locations for the various Kurdish groups used in their study are shown in Fig 17.
The Mt-DNA FST distances shown in Fig 19-22 are based on the table shown in Fig 18.
The results of this study are consistent with the aforementioned studies indicating a closer genetic relationship between Central Asians and Kurds than Central Asians and other West Asian and Iranic populations.
This study also showed Iraqi Kurmanji Kurds had a frequency of 6.6% of the Indian maternal haplogroup M1a.
Kurds cluster closest to Dungan Hui Xinjiang Chinese based on Y-DNA FST distances.
In “A genetic landscape reshaped by recent events: Y-chromosomal insights into central Asia” [4] the authors showed that Kurds clustered closest to Dungan Hui Xinjiang Chinese based on Y-DNA mircosatellite haplotypes ( Fig 24).
The Y-DNA haplogroup frequencies are shown in Fig 25 and the codes are described in Fig 2 of the paper [4]. Y-DNA and Mt-DNA FST distances convey relationships among populations based on a deeper ancestral level than intergenic autosomal mutations which are subject to higher accumulated mutations and genetic drift.
Formal statistics using the QpWave program [7] in two source cladliness mode indicates Central Asian Turkmen are genetically closer to Iraqi Kurmanji Kurds than they are to other West Asians.
To corroborate the various results from the aforementioned papers we performed QpWave analysis using densely genotyped public samples (Simmons Diversity Project available at Reich Lab website) and WGS Iraqi Kurmanji Kurd sample (sample personally known to me with verified 4 Kurmanji Kurd grandparents).
The principle behind using qpWave in cladliness mode is if 2 source population (pleft) are symmetrically related to all reference populations (pright) then the two source populations must share the same clade.
QpWave [7] computes a matrix of f-statistics of all possible pairs of populations in the ‘left’ and ‘right’ sets, of the form f4(Lefti, Leftj; Rightk, Rightl).
The sources used were ( X, C Asian Turkmen) where source 1, X was allowed to vary. Results are sorted based on chisq values with lower chisq values indicating a closer relationship between X and Turkmen.
To increase power to discriminate between closely related populations we used a large number (17) of higher quality densely genotyped ancient samples as references. They are:
Pright REFERENCES: Mbuti, Ust-Ishim-UP, Papuans, Devils-Gate-N, Anatolia-N, EHG-Dipl, Taforalt, Surui, Kotias, Loschbour-Dipl, Israel-C, Koyma-Mes-WGS, Sunghir6, Botai-EN-Dipl, Ukraine-Mes, Villabruna-UP, Yana-UP-GS
High coverage WGS or diploid genotyped ancient samples were used as references where possible and we were able to maintain a high SNP overlap of 600-800K throughout the tests. Results are shown in Fig 26.
To reduce the possibility of ancient DNA damage and incorrectly genotyped positions affecting the results we removed lower coverage ancient samples from the analysis.
QpWave analysis indicates that Kazakhstan Hun Sarmatians, Early Medevil Turks, Late-Xiongnu Han, and the Ulaanzuukh-SlabGrave Mongol samples are closer to Iraqi Kurmanji Kurds than they are to other West Asians
Using the aforementioned precautions and guidelines we performed qpWave analysis using the recently published ancient mongolian samples in [8]. The results are shown in Fig 28 in heat map format with the darker green shading indicating a closer genetic relationship and darker red a more distant relationship.
For reference the genetic composition of the Mongolian samples is shown in Fig 27 as per Fig S7 in the supplement of reference [8]. Note that Fontovo_EN is substantially similar to the SlabGrave samples in genetic composition. This is consistent with our qpWave results shown in Fig 28 where the source pair [ Fontovo-EN, Ulaanzuukh-SlabGrave ] resulted in a chisq output of 69 confirming the similarity between the 2 populations.
Changes in in the genetic landscape in Kurdistan since the Chalcolithic and Iron Age
The QpWave analysis (Fig 28) can provide us clues as to what changes in the genetic landscape of the Kurdistan area accompanied the Indo-Europeanization of the area.
We know for example that during the Chalcolithic, prior to the Indo-European migration waves into the Kurdistan area, both the language and the genetics of those Zagrosian sheep herders (Iran-Chl-HajiFiruz samples shown in Fig 28) were different from now. We can clearly see this when we compare the qpWave results for Iran-Chl to Iran-IA to the contemporary Kurd-IQ sample shown in Figure 28.
To investigate further we must ask ourselves which of the Mongolian or Kazakh Iron-Age samples shown in the table can potentially cause the results to change from the Chalcolithic to the Iron-Age results and then from the Iron-Age to the contemporary Kurdish result.
Genetic changes from the Iron-Age to the present in Kurdistan
We can immediately rule out Chemurchek as there appears to be more Chemurchek related in the Hasanlu-IA sample than in the Kurd-IQ sample. In fact a better approach may be to ask what is the biggest change between the Hasanlu-IA Iron Age sample and the Kurd-IQ contemporary sample.
Thus we observe in Figure 28 the biggest drop in chisq values from Hasanlu-IA and Kurd-IQ occurs for the samples shaded yellow ( for example Late-Med Mongol), ie the ones cotaining Han chinese related admixture, Kazakhstan Hun-Sarmatian, as well as the Siberian Khovsgol-MLBA samples, which consist primarily of Baikal-EBA type admixture.
Therefore, contemporary Kurds are likely a product of hybridization of an Iron-Age Hasanlu-IA like population with something similar to Late-Med Mongols or Kazakhstan Hun-Sarmatians. As to specifically where this population mixing occurred to produce contemporary Kurds should be a topic of future research. However, looking at the changes in results from Iran-Chl to Hasanlu-IA to contemporary Kurd-IQ, it appears that there were multiple population mixing events that occurred, the secrets of which are buried in the sands of time.
It is also likely that the hybridization of a Hasanlu-IA population with something similar to Kazakhstan Hun-Sarmatian or Late-Med Mongol to produce contemporary Kurds occurred somewhere north-east of present day Kurdistan, perhaps closer to the birthplace of Zoroaster and the center of the Parthian empire.
Shugnan Tajiks descended from a population substantially similar to Late Xiongnu Sarmatians
The qpWave output table shown in Figure 28 is useful for inferring other relationships between ancients and modern populations. For example, Tajik-Shugnan and Late Xiongnu Sarmatians jump out with a chisq of only 80 (considered low due to the high number of pright references used).
Turkmen and Karakalpak; superficially similar populations with significant differences in deeper ancestry
Turkmen and Karakalpak both inhabit the Turkmenistan area and superficially resemble each other. They also cluster near each other using clustering programs such as Admixture, which are more sensitive to recent shared drift, however, qpWave analysis shows there are significant differences between the two ethnic groups.
In figure 28 we immediately observe that the early Medieval Turkic samples, which are approximately 40% Sarmatian and 60% East Asian Ulaanzuukh [8] related are genetically much more similar to Karakalpaks (which are primarily Kipchak Turkic descended) than they are to Turkmen (primarily Oghuz Turkic descended), with a chisq 68 vs 627.
Turkmen (Oghuz) in contrast to Karakalpaks (Kipchaks) share more genetic drift with the Early Medieval Uyghur samples, which are about 40% Alan, 40% East Asian Ulaanzuukh, and 20% BMAC related [8].
The other notable difference between Turkmen and Karakalpaks is that the latter share significantly more genetic drift with the Late Medieval Mongolian and Kazakhstan Hun-Sarmatian samples.
Another observation is that although Turkmen overall have more East Asian ancestry than Kurds, the Kazakhstan Hun-Sarmatian samples share more genetic drift with the contemporary Kurds than with Turkmen (chisq 1195 vs 2158). By contrast, the Kazakhstan Hun-Sarmatian samples share more genetic drift with Karakalpaks than the contemporary Kurdish sample (chisq 410 vs 1195)
Additionally, Khosvsgol-LBA which are primarily Siberian Baikal-EBA descended [8] share substantially more genetic drift with Karakalpaks than with Turkmen (chisq 595 vs 1430). Thus although Karakalpaks (Kipchaks) may superficially resemble Turkmen (Oghuz), on a deeper level the two neighboring ethnic groups harbor significantly different deep ancestry.
It is also quite likely that the significant BMAC related ancestry Turkmen harbor causes them to be phenotypically more similar to Kurds and other Iranics, in contrast to Karakalpaks.
Introduction of Indo-Iranian languages to the Iranian Plateau
At present Indo-Iranian languages are spoken from the Black Sea region to Xinjiang province in China (Sarikol spoken by Tajiks). Gernot Windfuhr who is professor of Iranian Studies, placed Kurdish under the Parthian branch, albeit with a Median substratum [9]. This is shown in fig 29. Based on our personal fluency of the Kurdish and Pashto languages, there appears to be dozens, if not hundreds of nouns and verbs which are found in both Kurdish and Pashto to the exclusion of Farsi. Since Parhtians originated from the present day Afghanistan region, this could possibly be an artifact of Parthian having a greater influence on the Kurdish and Pashto languages to the exclusion of Farsi, which is classified as a southwestern Iranian language.
It appears that Indo-Iranian became dominant in the Iran region during the Iron-Age with the Medes and Parthians around 2500 years ago. Unterlander, Palstra, and Lazaridis et al [1] showed that Kurds have an almost 100% probability of descent from Iron-Age Scythians (figs 12-14).
Our own work here and in “Impact of the Iron Age Saka and Scythians on the demography of Kurd” , 2018, here , using formal statistical methods, showed that contemporary Kurds are genetically the most shifted from Chalcolithic Iranians on the East Asian axis. This is consistent with Unterlander, Palstra, and Lazaridis et al [1] findings (figs 12-14).
The question becomes what size population effected this linguistic and genetic change to Indo-Iranian from the earlier language spoken by Chalcolithic Iranians. Assuming Kurds are descended from those Chalcolithic Iranians excavated from the Zagros mountain Kurdistan region, we can alternatively ask what is the percentage of Iron-Age Parthians and Scythians that hybridized with the local pre-Iron-Age Kurdistan population around 2000 to 2700 years ago to produce contemporary Kurds and effect the linguistic change to Indo-Iranian.
Using qpAdm [7], we were able to produce good genetic models for Kurds using a 3-way combination of Chalcolithic Iranians + Neolithic Levant + Iron-Age Scythian & Sarmatians.
We used many precautions conducting this genetic analysis using qpAdm and qpWave [7]. They include:
- Only using the highest coverage/quality ancient samples from each source populations
- Using whole genome sequenced Iraqi Kurd samples to maximize SNP overlap with ancient samples.
- Using qpWave [7] to rule out the possibility that Kurds can simply be modeled using 2 streams of ancestry consisting of Chalcolithic Zagrosian Iranians and Neolithic Levant.
- Using a high number of either WGS or diploid ancient pright reference populations to maximize detection of subtle genetic differences between Kurds and the 3 pleft sources.
The following pright references were used in the qpAdm analysis:
Jo-Hoan-Simmons
Devils-Gate-Neolithic-WGS
Iran-GanjDareh-N
Anatolia-Neolithic
EHG-I0061-DIPLOID
Morocco-Iberomaurusian
Loschbour-DIPLOID
Kolyma-Mesolithic-WGS
Russia-Sunghir6
Botai-EN-DIPLOID
Yana-UP-WGS
Depending on the Scythian/Sarmatian samples used we were able to maintain an overlap of 220,000 to 400,000 SNPs between the samples.
We used a p-value > 0.05 for passing models, with one model reaching a p-value of 0.99.
Using whole genome sequenced (WGS) ancient and modern DNA sanples, along with qpAdm and qpWave formal analysis [7] our passing admixture models for Kurds, subject to a set of very stringent protocols outlined in the last section, show that present day Kurds derive approximately 58% to 71% from a local West Asian pre-Indo-Iranian pastoralist stock, supplemented with 29% to 42% Scythian/Sarmatian/Parthian related Indo-Iranian admixture from Central Asia (fig 30-31). These admixture events around 2000-2700 years ago between the local pre-Indo-Iranic ancient sheep herders of the Zagros mountains and Iranian plateau, and the invading Scythian/Sarmatian/Parthians/Medes were likely responsible for the proliferation of Indo-Iranian languages in the region.
Those Indo-Iranians are also likely responsible for some Eurasian steppe male Y-DNA haplogroup lineages such as R1a Z93 and others that we observe with higher frequency among Kurds than among other West Asian Iranic and non-Iranic populations.
….To be continued ….
References
- Unterländer M, Palstra F, Lazaridis I, Pilipenko A, Hofmanová Z, Groß M, Sell C, Blöcher J, Kirsanow K, Rohland N, Rieger B, Kaiser E, Schier W, Pozdniakov D, Khokhlov A, Georges M, Wilde S, Powell A, Heyer E, Currat M, Reich D, Samashev Z, Parzinger H, Molodin VI, Burger J. Ancestry and demography and descendants of Iron Age nomads of the Eurasian Steppe. Nat Commun. 2017 Mar 3;8:14615. doi: 10.1038/ncomms14615. PMID: 28256537; PMCID: PMC5337992.
- Zarei F, Rajabi-Maham H. Phylogeography, genetic diversity and demographic history of the Iranian Kurdish groups based on mtDNA sequences. J Genet. 2016 Dec;95(4):767-776. doi: 10.1007/s12041-016-0692-4. PMID: 27994175.
- Madih, ‘Abbas-‘Ali. “The Kurds of Khorasan.” Iran & the Caucasus, vol. 11, no. 1, 2007, pp. 11–31. JSTOR, www.jstor.org/stable/25597312.
- Zerjal T, Wells RS, Yuldasheva N, Ruzibakiev R, Tyler-Smith C. A genetic landscape reshaped by recent events: Y-chromosomal insights into central Asia. Am J Hum Genet. 2002;71(3):466-482. doi:10.1086/342096
- Ali Amirzargar, Diego Rey. Kurds HLA Genes: Its Implications in Transplantation and Pharmacogenomics. Openmedicine Journal. 2015;2:43-47. DOI: 10.2174/1874220301401010043
- Dr. Michael Izady. Infographs, Maps and Statistics Collection. [https : //gulf2000.columbia.edu/images/maps/Iran_Languages_2000_sm.png]
- Assessing the Performance of qpAdm: A Statistical Tool for Studying Population Admixture
Éadaoin Harney, Nick Patterson, David Reich, John Wakeley bioRxiv 2020.04.09.032664; doi: https://doi.org/10.1101/2020.04.09.032664. - A dynamic 6,000-year genetic history of Eurasia’s Eastern Steppe
Choongwon Jeong, Ke Wang, Shevan Wilkin, William Timothy Treal Taylor, Bryan K. Miller, Sodnom Ulziibayar, Raphaela Stahl, Chelsea Chiovelli, Jan H. Bemmann, Florian Knolle, Nikolay Kradin, Bilikto A. Bazarov, Denis A. Miyagashev, Prokopiy B. Konovalov, Elena Zhambaltarova, Alicia Ventresca Miller, Wolfgang Haak, Stephan Schiffels, Johannes Krause, Nicole Boivin, Erdene Myagmar, Jessica Hendy, Christina Warinner
bioRxiv 2020.03.25.008078; doi: https://doi.org/10.1101/2020.03.25.008078 - Windfuhr, Gernot (1975), “Isoglosses: A Sketch on Persians and Parthians, Kurds and Medes”, Monumentum H.S. Nyberg II (Acta Iranica-5), Leiden: 457–471
Recent Comments