Indo-Europeanization of Iran & Kurdistan – The genetic substructure of the Indo-Iranian invaders
ABSTRACT
Sometime between the late Bronze-Age and the early Iron-Age, Iran experienced a linguistic and population replacement. Approximately 2700 years ago, the Indo-Iranian Medes established the Median empire which stretched from north-west Pakistan westwards to Turkey, and thereby brought genetic, linguistic, as well as ideological changes to Iran.
Although not much is known about this period in time in Iran, our comprehensive study highlights in detail the changes to the Chalcolithic western Iranian sheep herder genetic substructure that accompanied the linguistic and ideological changes there with the rise of the Medes, Parthians, and subsequent Central Asian steppe Indo-Iranians.
The linguistic changes in western Iran around 2700 years included the rise of the Indo-Iranian languages, and their replacement of the Elamite language isolate, which perhaps was related to the Dravidian languages of India.
On the ideological front, the Indo-Iranians starting with the Medes, and later continuing with the Parthians, brought the zoroastrian religion from Central Asia into western Iran and the Kurdistan area. The zoroastrian scriptures were written in the old Iranian Avestan language. Zoroastrianism, along with Buddhism, the Rig Veda, and Avestan originated originated in ancient Arachosia, Aria, Bactria, and Margiana around present day Afghanistan.
Our analysis formally with qpAdm indicates that genetically Medes can be successfully modeled (p=0.04) as follows:
- Approximately 51% ancient 7700 year old non Indo-Iranian Chalcolithic Zagrosian sheep herder;
- Approximately 15% Neolithic Levant farmer related. This component of ancestry possibly arrived in western Iran during the Old & Middle Assyrian Empires;
- Approximately 31% Indo-Iranian related to a 2800 year old Turkmenistan-IA sample from the Yaz culture (fig 1) of Central Asia. This sample appears to be 60% early Indo-Iranian derived from the neighboring Andronovo-MLBA culture (fig 4);
Our formal analysis using qpAdm subject to the numerous precautions and strict protocols outlined under “Methods & Materials” indicates that contemporary Iraqi Kurds are predominantly Mede derived (p-values 0.33 & 0.47, fig 2), with approximately :
- 71% Indo-Iranian Mede related ancestry as proxied by the 2700 year old Hasanlu-IA near a Kurdish region in Iran (fig 1);
- 7-11% ancient Neolithic Levant farmer related;
- 17-22% Indo-Iranian Central Asian Scythian/Sarmatian related. Some of this component of ancestry could have been mediated to Kurds via the 2000 year old Parthians. This is consistent with Gernot Windfuhr, professor of Iranian Studies, placing Kurdish under the Parthian branch, albeit with a Median substratum [9] (fig 9).
Finally, we validate our results on multiple levels as described below.
DISCUSSION
Over the past 3000 years, the genetic, linguistic, and ideological landscape of Iran has experienced massive change. Hitherto, the characterization of the genetic substructure of the Indo-Iranian invaders from the Eurasian steppe into Iran and Kurdistan over the last 3000 years has not been well studied.
Using qpWave and qpAdm workflow analysis [7] and the highest quality WGS and diploid contemporary Kurd and ancient genomes, we are able to confidently characterize the genetic substructure of the Central Asian Middle to Late Bronze Age (MLBA) and Iron-Age (IA) steppe components, as well as the local near-eastern ancient genetic components incorporated into the Medes and contemporary Kurds. Our analysis reveals multiple layers of Central Asian Indo-Iranian steppe admixture into the Iran region over the last 3000 years.
We use a multi-disciplinary approach to formally show:
- Indo-Iranians (Iran-IA-Hasanlu) were present in NW Iran 2700 years ago;
- The Mede has approximately 30% Indo-Iranian steppe admixture from a population related to the 2800 year old Yaz culture Turkmenistan-IA (Fig 3). This admixture can be characterized as approximately 18% proto-Indo-Iranian Andoronovo-MLBA related admixture;
- Kurds have approximately 21% Central Asian steppe Yaz culture Turkmenistan-IA related admixture which can be characterized as approximately 13% Andronovo-MLBA related admixture. QpAdm models indicate this 21% Turkmenistan-IA related was mediated to Kurds via the Medes (fig 2-4);
- Above and beyond the 21% Yaz Turkmenistan-IA related Indo-Iranian admixture received via Medes, qpAdm models unequivocally agree that Kurds have received an additional 17-22% Central Asian Steppe-IA admixture subsequent to the Medes, likely mediated via Parthians, Scythians, and Sarmatians. The sub-character of this admixture is similar to that found in Tian-Shan Huns and Xiongnu Sarmatians (fig 2-4);
- Medes; Iran-Chl + Levant-N + Steppe-MLBA (via Turkmenistan-IA);
- Kurds: Iran-Chl + Levant-N + Steppe-MLBA + Steppe-IA (via Scythians/Sarmatians), or Mede + Levant-N + Steppe-IA
In western Iran, Indo-Europeanization included transformation from the Chalcolithic Iranian sheep herder genetics, herein exemplified with a higher quality 7700 year old 7.29 coverage sample from the Zagrosian Kurdish region of Iran; HajiFiruz-Chl-I4349 [10], to the Eurasian Steppe admixed 2700 year old Indo-Iranian Medes, herein exemplified with the 2700 year old 2.04 coverage sample near the Iranian Kurdish region of Hasanlu Tepe; Hasanlu-IA-F38, hereinafter referred to as the “Mede”.
These 7700 year old Zagrosian sheep herders (Iran-Chl) were primarily a mix of the earlier Neolithic Iranian and Anatolian farmers, whereas the 2700 year old Medes (Hasanlu-IA) we herein show to be a mix of:
- 7700 year old Zagrosian herders (Iran-Chl);
- 8700 year old Neolithic Levant farmers (Levant-N-I0867, 2.08 coverage) from Israel;
- 2800 year old early Iron-Age Indo-Iranian from Turkmenistan (Turkmenistan-IA-DA382, 1.79 coverage). qpAdm indicates this sample to be a 60/40 mix of Andronovo-MLBA/BMAC (Gonur-BA) related (fig 5).
Most of the Levant-N related admixture in contemporary Kurds is very old
The qpAdm analysis indicates that most of the Neolithic Levant related admixture in contemporary Kurds is at least 2700 years old and was inherited from their Mede ancestors. We hypothesize that this type of admixture increased in western Iran after the Chalcolithic, most likely during the Old and Middle Assyrian Empires.
We see evidence of this in the qpAdm analysis shown in figs 2-4, where the amount of Levant-N in Kurds significantly drop when we replace Iran-Chl with the 2700 year old Mede, indicating that the Mede carried greater Levant-N related admixture than it’s chalcolithic predecessor. Figure 3 corroborates this where the Mede (Hasanlu-IA) is modeled with Iran-Chl +14.9% Levant-N (p-value 0.04).
Depending on the timing of introgression of additional Levant-N related ancestry post Iran-Chl, and pre Iran-IA, with the ancient Mesopotamians (likely Assyrians), this would determine whether those Mesopotamians would have received steppe admixture or not. If hybridization between the Mesopotamians (Assyrians) and Medes or their ancestors occurred after introduction of steppe admixture into NW Iran, that would mean that those Mesopotamians (Assyrians) would have picked up steppe admixture via the Medes or their ancestors. Early Iron-Age or late Bronze-Age genomes from Mesopotamia would shed further light on this.
Kurds can not be modeled without Scythian or Sarmatian related admixture
Consistent with Unterlander, Palstra, and Lazaridis et al [1] findings (figs 7-9) that Kurds (KTQ) have almost a 100% posterior probability of descent from western and eastern Scythians, our qpAdm analysis also strongly suggests a layer of Scythian related admixture subsequent to the Medes. When modeling Kurds using Iran-Chl, the amount of such Scythian admixture ranges from 26.7% (p-value 0.28) when using Hun-TianShan, to 31.3% (p-value 0.37) when using Sarmatian-Xiongnu. This is in addition to 16-18% admixture from a layer of 2800 year old Turkmenistan-IA (Steppe-MLBA proxy).
Thus total Bronze and Iron-Age Indo-Iranian steppe in Kurds is 43% (p-value 0.28) to 49% (p-value 0.37). This reflects the Central Asian steppe Indo-Iranian linguistic, genetic, and ideological impact upon the 7700 year old ancient inhabitants of western Iran.
Kurds received much of their Steppe-MLBA related admixture from the Medes
Most researchers associate the 3000 to 5000 year old (Middle to Late Bronze Age) West Siberian Andronovo culture with early Indo-Iranian languages. This culture bordered the BMAC and Yaz Iranic cultures of Turkmenistan. Thus is consistent with qpAdm models indicating that the 2800 year old Iranic Turkmenistan-IA sample is better modeled as BMAC + Andronovo-MLBA with a 40/60 mix, instead of BMAC + Sintashta-MLBA as evidenced by significantly lower p-values (fig 5).
QpAdm models indicate that Kurds received much of their Steppe-MLBA related admixture from the Medes (fig 2 & 4). We see in figure 2 that when Iran-Chl is replaced with Iran-IA-Hasanlu (Mede), the Turkmenisan-IA admixture in Kurds disappears indicating that Medes carried enough of this type of admixture to eliminate the need for additional admixture of this character in Kurds (figs 2 & 4).
Validation of Kurd admixture models containing Scythians & Sarmatians
Inspite of findings by Unterlander, Palstra, and Lazaridis et al [1] (figs 7-9) that Kurds (KTQ) have almost a 100% posterior probability of descent from western and eastern Scythians, and our own solid qpAdm analysis suggesting that contemporary Kurds received a layer of Scythian related admixture above their Mede ancestry, a few critics may hypothesize that the genetic affinity between Kurds and Scythians may simply be an artifact of Kurds and Scythians sharing common ancestral proto-Indo-Iranian Sintashta-MLBA related genetic substructure.
We are able to address the aforementioned criticism and validate our models as follows. If the aforementioned criticism is valid then we should be able to reject qpAdm models for Kurds consisting of either:
- Iran-Chl + Levant + Turkmenistan-IA + primarily E.Asian admixed Mongolian-IA, and;
- Iran-IA+ Levant + Turkmenistan-IA + primarily E.Asian admixed Mongolian-IA.
The reason we should be able to reject models with primarily E. Asian admixed samples is if the affinity between Kurds and Scythians is simply due to Sintashta-MLBA related being ancestral to both Kurds and Scythians, is that Sintashta-MLBA does not have E. Asian admixture, thus Kurds could not have received E. Asian admixture simply because they have Sintashta-MLBA related admixture mediated to them via Yaz culture Indo-Iranian Turkmenistan-IA proxy.
We thus tested qpAdm models for Kurds consisting of:
- Iran-Chl + Levant + Turkmenistan-IA + Han-Xiongnu [11], and;
- Iran-IA+ Levant + Turkmenistan-IA + Han-Xiongnu
We show the results of this analysis in figure 4. The model using Iran-Chl shows Kurds with 9% Han-Xiongnu (p-value 0.01), and the one using Iran-IA (Mede) shows Kurds with 7.3% (p-value 0.01).
Han-Xiongnu was modeled using qpAdm in Choongwon Jeong et al [11] as:
59.8% Ulaanzuukh-SlabGrave + 32.7% Han + 7.5% Sarmatian (p-value 0.91). Since Ulaanzuukh is almost 100% E. Eurasian, this would indicate that Han-Xiongnu is over 95% E. Eurasian.
The qpAdm models shown in figure 4 show that we can reject a hypothesis of Kurds having genetic affinity with Scythians due to Kurds and Scythians sharing common ancestral proto-Indo-Iranian Sintashta-MLBA related genetic substructure. Additionally, the models shown in figure 4 are able to validate our hypothesis of Kurds receiving Central Asian Scythian admixture post 2700 year old Hasanlu-IA (Mede), since replacing Iran-Chl with Iran-IA doesn’t significantly reduce the Han-Xiongnu [11] related admixture in Kurds.
Corroboration of QpAdm results using qpWave cladeness check
We used qpWave in rank=0 cladeness mode for further confirmation of our qpAdm models. The principle behind using qpWave in this mode is if 2 source population (pleft) are symmetrically related to all reference populations (pright) then the source populations must share the same clade.
QpWave [7] computes a matrix of f-statistics of all possible pairs of populations in the ‘left’ and ‘right’ sets, of the form f4(Lefti, Leftj; Rightk, Rightl).
Similar to our analysis using qpAdm we use a large balanced east and west Eurasian set high quality densely genotyped diploid and WGS references to enable us to differentiate closely related populations. We detail this set of references under our “Methods and materials section”.
We display the qpWave results in the heatmap shown in figure 9. The values displayed are chisq for rank=0. The lower the chisq (darker green) the closer the genetic relationship between the pair of samples. The darker the red, the more distant the relationship between the pair.
If we check the results of our Iraqi Kurdish sample we note that it has a substantially closer genetic relationship with Scythians and ancient Mongolians, especially the populations with higher East Asian admixture, than both of its predecessors; Iran-Chl and Iran-IA (Mede) indicating East Asian geneflow subsequent to the 2700 year old Hasanlu-IA. This is consistent with our analysis using qpAdm.
Genetic similarity of various ancient and modern populations with Kurds
We test the genetic relationship of Kurds to various West and East Eurasian ancient populations using qpWave in rank=0 cladeness mode as described above. Results are shown in figure 10. ran-IA-Hasanlu (Mede) has with a chisq of 22 has a p-value of 0.24 and can thus be considered a clade with Kurd-IQ.
Surprisingly, the Mongolian sample Uyghur-OLN002 [11] is genetically closer to Iraqi Kurd than the Iranian chalcolithic sample and the Yaz culture 2800 year old Indo-Iranian Turkmenistan-IA. This along with the relative low genetic distance between Iraqi Kurd in comparison to the ancient west Asian populations is consistent with our findings of substantial central Asian Indo-Iranian ancestry for Kurds. This also explains why Kurds are east shifted with regards to their current geographical location in west Asia.
It is also readily apparent in figure 10 that Kurds are relatively distant from both Sintashta-MLBA and Andronovo-MLBA. This is consistent with our findings that proto-Indo-Iranian geneflow into Kurds was mediated via Iron-Age Indo-Iranic populations such as Turkmenistan-IA and Scythians, both of which have substantial native Central Asian admixture.
METHODS & MATERIALS
Using whole genome sequenced (WGS) ancient and modern DNA sanples, along with qpAdm and qpWave formal analysis [7] our passing admixture models for Kurds, subject to a set of very stringent protocols outlined in the last section, show that present day Kurds derive approximately 58% to 71% from a local West Asian pre-Indo-Iranian pastoralist stock, supplemented with 29% to 42% Scythian/Sarmatian/Parthian related Indo-Iranian admixture from Central Asia (fig 30-31). These admixture events around 2000-2700 years ago between the local pre-Indo-Iranic ancient sheep herders of the Zagros mountains and Iranian plateau, and the invading Scythian/Sarmatian/Parthians/Medes were likely responsible for the proliferation of Indo-Iranian languages in the region.
Using qpAdm [7], we were able to produce good genetic models for Kurds using a 3-way combination of Chalcolithic Iranians + Neolithic Levant + Iron-Age Scythian & Sarmatians.
We used many precautions conducting this genetic analysis using qpAdm and qpWave [7]. They include:
- Only using the highest coverage/quality ancient samples from each source populations
- Using whole genome sequenced Iraqi Kurd samples to maximize SNP overlap with ancient samples.
- Using qpWave [7] to rule out the possibility that Kurds can simply be modeled using 2 streams of ancestry consisting of Chalcolithic Zagrosian Iranians and Neolithic Levant.
- Using a high number of either WGS or diploid ancient pright reference populations to maximize detection of subtle genetic differences between Kurds and the 3 pleft sources.
The following pright references were used in the qpAdm analysis:
Jo-Hoan-Simmons
Devils-Gate-Neolithic-WGS
Iran-GanjDareh-N
Anatolia-Neolithic
EHG-I0061-DIPLOID
Morocco-Iberomaurusian
Loschbour-DIPLOID
Kolyma-Mesolithic-WGS
Russia-Sunghir6
Botai-EN-DIPLOID
Yana-UP-WGS
Depending on the Scythian/Sarmatian samples used we were able to maintain an overlap of 220,000 to 400,000 SNPs between the samples.
We used a p-value > 0.05 for passing models, with one model reaching a p-value of 0.99.
For our qpWave rank=0 cladeness check the following references were used (pright):
Jo-Hoan-Simmons
Ust-Ishim-UP-DIPLOID
Papuans
Devils-Gate-Neolithic-WGS
Kostenki-14
Onge_1000G
Anatolia-Neolithic
EHG-I0061-DIPLOID
Morocco-Iberomaurusian
Surui
Kotias-CHG
Loschbour-DIPLOID
Israel-Chl
Kolyma-Mesolithic-WGS
Russia-Sunghir 6
Botai-EN-DIPLOID
Yana-UP-WGS
Ukraine-Mesolithic
Villabruna
Introduction of Indo-Iranian languages to the Iranian Plateau
At present Indo-Iranian languages are spoken from the Black Sea region to Xinjiang province in China (Sarikol spoken by Tajiks). Gernot Windfuhr who is professor of Iranian Studies, placed Kurdish under the Parthian branch, albeit with a Median substratum [9]. This is shown in fig 9. Based on our personal fluency of the Kurdish and Pashto languages, there appears to be dozens, if not hundreds of nouns and verbs which are found in both Kurdish and Pashto to the exclusion of Farsi. Since Parhtians originated from the present day Afghanistan region, this could possibly be an artifact of Parthian having a greater influence on the Kurdish and Pashto languages to the exclusion of Farsi, which is classified as a southwestern Iranian language.
It appears that Indo-Iranian became dominant in the Iran region during the Iron-Age with the Medes and Parthians around 2500 years ago. Unterlander, Palstra, and Lazaridis et al [1] showed that Kurds have an almost 100% probability of descent from Iron-Age Scythians (figs 6-8).
Our own work here and in “Impact of the Iron Age Saka and Scythians on the demography of Kurd” , 2018, here , using formal statistical methods, showed that contemporary Kurds are genetically the most shifted from Chalcolithic Iranians on the East Asian axis. This is consistent with Unterlander, Palstra, and Lazaridis et al [1] findings (figs 6-8).
Those Indo-Iranians are also likely responsible for some Eurasian steppe male Y-DNA haplogroup lineages such as R1a Z93 and others that we observe with higher frequency among Kurds than among other West Asian Iranic and non-Iranic populations.
References
- Unterländer M, Palstra F, Lazaridis I, Pilipenko A, Hofmanová Z, Groß M, Sell C, Blöcher J, Kirsanow K, Rohland N, Rieger B, Kaiser E, Schier W, Pozdniakov D, Khokhlov A, Georges M, Wilde S, Powell A, Heyer E, Currat M, Reich D, Samashev Z, Parzinger H, Molodin VI, Burger J. Ancestry and demography and descendants of Iron Age nomads of the Eurasian Steppe. Nat Commun. 2017 Mar 3;8:14615. doi: 10.1038/ncomms14615. PMID: 28256537; PMCID: PMC5337992.
- Zarei F, Rajabi-Maham H. Phylogeography, genetic diversity and demographic history of the Iranian Kurdish groups based on mtDNA sequences. J Genet. 2016 Dec;95(4):767-776. doi: 10.1007/s12041-016-0692-4. PMID: 27994175.
- Madih, ‘Abbas-‘Ali. “The Kurds of Khorasan.” Iran & the Caucasus, vol. 11, no. 1, 2007, pp. 11–31. JSTOR, www.jstor.org/stable/25597312.
- Zerjal T, Wells RS, Yuldasheva N, Ruzibakiev R, Tyler-Smith C. A genetic landscape reshaped by recent events: Y-chromosomal insights into central Asia. Am J Hum Genet. 2002;71(3):466-482. doi:10.1086/342096
- Ali Amirzargar, Diego Rey. Kurds HLA Genes: Its Implications in Transplantation and Pharmacogenomics. Openmedicine Journal. 2015;2:43-47. DOI: 10.2174/1874220301401010043
- Dr. Michael Izady. Infographs, Maps and Statistics Collection. [https : //gulf2000.columbia.edu/images/maps/Iran_Languages_2000_sm.png]
- Assessing the Performance of qpAdm: A Statistical Tool for Studying Population Admixture
Éadaoin Harney, Nick Patterson, David Reich, John Wakeley bioRxiv 2020.04.09.032664; doi: https://doi.org/10.1101/2020.04.09.032664. - A dynamic 6,000-year genetic history of Eurasia’s Eastern Steppe
Choongwon Jeong, Ke Wang, Shevan Wilkin, William Timothy Treal Taylor, Bryan K. Miller, Sodnom Ulziibayar, Raphaela Stahl, Chelsea Chiovelli, Jan H. Bemmann, Florian Knolle, Nikolay Kradin, Bilikto A. Bazarov, Denis A. Miyagashev, Prokopiy B. Konovalov, Elena Zhambaltarova, Alicia Ventresca Miller, Wolfgang Haak, Stephan Schiffels, Johannes Krause, Nicole Boivin, Erdene Myagmar, Jessica Hendy, Christina Warinner
bioRxiv 2020.03.25.008078; doi: https://doi.org/10.1101/2020.03.25.008078 - Windfuhr, Gernot (1975), “Isoglosses: A Sketch on Persians and Parthians, Kurds and Medes”, Monumentum H.S. Nyberg II (Acta Iranica-5), Leiden: 457–471
- The formation of human populations in South and Central Asia, Vagheesh M. Narasimhan, Nick Patterson, Priya Moorjani, et al, 2018, http://science.sciencemag.org/content/365/6457/eaat7487.abstract
- A dynamic 6,000-year genetic history of Eurasia’s Eastern Steppe
Choongwon Jeong, Ke Wang et al.
bioRxiv 2020.03.25.008078; doi: https://doi.org/10.1101/2020.03.25.008078
Recent Comments