The Onge versus other Indian & SE Asian ASI proxies

The Onge are an endangered relic population with only about 100 surviving members, who inhabit the Andaman Islands in the Indian Ocean. They are classified as Negritos, and are related to Negritos from neighboring NE India, Burma, Cambodia, and Indonesia. The unifying physical characteristics of Negritos are short stature, curly hair, and dark pigmentation. Although short stature and dark pigmentation can be prevalent among some Indian and SE Asian tribals, the same can not be said of the extreme curly hair seen in Andamanese populations, which is characteristic of Africans.

Due to the proximity of the Andaman Islands to Burma, the most logical time frame of migration of the Andamanese from Burma is around the Last Glacial Maximum (LGM), when the islands were separated by less than 50 miles from Burma.

The Onge, along with other related tribals are believed to have branched out of one of mankind’s oldest splits, Ancestral South Indians (ASI), who are genetically distinct from West Eurasians, East Asians, and Australasians.

Although estimates vary from 7,000 to 50,000 years, with Harvard leaning towards 40,000 years as the period of separation of the Onge from other mainland Negritos, my analysis better supports the 7,000 year time frame best.

The Onge are extensively used in scientific literature as a proxy for ASI when genetically modeling South Asians. This is done because of a lack of “ASI” ancient DNA from South Asia. They are however, an imperfect ASI proxy, because they are genetically very “drifted”. Drift here refers to the fact that allele frequencies change within a population due to random events.

The small population size of the Onge has acted to accelerate their genetic drift, which decreases diversity within a population and drives it to genetic uniformity over time. By contrast, genetic drift also acts to increase divergence between the Onge and other populations. In other words, hetrozygosity is decreased over time, or alternatively, some alleles are either entirely lost, or become fixed over time (100% allele frequency).

Mutations can act to increase diversity, but their rate is only about 10^-8/bp/generation for humans, which is sort of slow. For example, with 3 x 10⁹ base pairs in the genome, this translates to about 30 mutations per genome per generation, or alternatively, 0.01 mutations per 1,000,000 SNPs ( typical public datasets) per generation, or alternatively, only 16 mutations per genome per 40,000 years (assuming 25 years per generation). Clearly, mutations are a weak force and alone can’t counteract the rapid rate at which alleles either become lost or fixed in small drifted populations such as the Onge.

Assuming no selection pressures on an allele, which is usually the case with non-functional intergenic alleles, and genetic drift is the only evolutionary force acting on it, the probability that the allele will eventually become fixed is simply its allele frequency in the population. So for example if allele A has a frequency of 85% within the Onge, then the probability that A will become fixed in due time is 85%, and the probability that A will be lost in due time may be 15%. Generally, my analysis has shown that in small drifted populations, alleles with low frequencies tend to become lost, and alleles with high frequencies tend to become fixed in due time.

According to the Wright–Fisher model, the following formula can be used for approximating the time it takes a neutral allele to become fixed via genetic drift:

T ^fixed = [ -4N^e (1 – p) ln (1 – p)] / p

where T is the time in generations, p the allele frequency, and N^e the effective breeding population size.

I used the PopG genetic simulation program¹ to run various simulations to get estimate the effect of drift, isolation, and small population size on the hetrozygosity rate of the Onge. The various simulations I ran with PopG using an average breeding population size of 500 indicate that if in fact the have been isolated for 40,000 years, then they would be expected to be almost homozygous at all positions due to loss and fixation of alleles over the course of 1600 generations, based on their small population (fig 1-4).

SIMULATIONS FOR 40,000 YEAR DRIFT

For the following simulations, I used a migration rate of 0% between populations (isolation), with no selective advantage for any allele (neutral alleles), with a mating population size of 500 which is a reasonable assumption for the Onge.

As I had suspected, the simulations for allele frequencies lower than 30% (rarer alleles) showed that they were mostly lost within about 1400 generations, with a few fixed. Out of a total of 30 simulations, only 3 simulations showed that the position remained hetrozygous after 1600 generations (fig 1 – 3). In other words, hetrozygosity was lost in 90% of the positions within 1600 generations for alleles with a frequency less than 30%.

Fig 1 – Ten simulations using an initial allele frequency of 5% showing a probability of alleles lost within about 700 generations of 100%.

Fig 2 – Ten simulations using an initial allele frequency of 10% showing a probability of alleles lost within about 800 generations of 80%.

Fig 3 – Ten simulations using an initial allele frequency of 30% showing a probability of alleles lost within about 1400 generations of 70%., and of alleles fixed 20%.

For more common alleles, with allele frequencies > 50% within the Onge, in other words Onge signature variants, the probabilities of the alleles becoming fixed increases, as shown in fig 4.

Fig 3 – Ten simulations using an initial allele frequency of 50% showing a probability of alleles lost within about 1600 generations of 20%., and of alleles fixed 60%.

Fig 4 – Ten simulations using an initial allele frequency of 70% showing a probability of alleles lost within about 1400 generations of 40%., and of alleles fixed 60%.

SIMULATIONS FOR 10,000 YEAR DRIFT

The above simulations ran using the assumption that the Onge separated from their mainland SE Asian relatives 40,000 years ago and with no subsequent migration into the Andamans since then indicated that the Onge should have lost almost all their hetrozygosity by now, since the vast majority of their alleles should have either become fixed or lost by now. Reality though tells a different story, because hetrozygosity within the Onge is not that different from mainland Indian and SE Asian populations as shown in fig 5, which was obtained using Plink using around 100K intersecting variants.

Surprisingly, the Onge showed substantially more hetrozygosity than populations such as Mbuti, Nganasan, Yakut, Balochi, Negev Desert Bedouins, Balochi, Zoroastrians, Saudis and Sardinians. Thus, this would be inconsistent with the Onge separating from mainland Asian populations around 40,000 years ago with no significant migrations into the Andaman Islands since.

Based on the relatively high hetrozygosity amongst the Onge, it is much more likely that they departed mainland SE Asia after LGM, perhaps around 7,000 to 10,000 years ago. Genetic simulations run using PopG support this scenario, where only 6 out of 30 simulations (20%) showed alleles becoming fixed within 400 generations (10,000 years), with no alleles lost, for alleles common in Onge (MAF > 49%, fig 6-8).

Fig 5- Normalized homozygosity for various popualtions obtained using Plink with 100K intersecting variants.

Fig 6 – Ten simulations for an effective breeding population of 500, and alleles common within the Onge with a frequency of 90%.

Fig 7 – Ten simulations for an effective breeding population of 500, and alleles common to Onge with a frequency of 70%.

Fig 8 – Ten simulations for an effective breeding population of 500, and alleles with a frequency of 50% within the Onge.

Fig 9 – Ten simulations for an effective breeding population of 500, and with a frequency of 30%. within the Onge

Fig 10 – Ten simulations for an effective breeding population of 500, and with a frequency of 10%. within the Onge.

ALLELE SHARING BETWEEN VARIOUS ASI PROXIES AND EURASIANS

IBS comparisons were performed using Plink to determine allele sharing between the Onge and neighboring mainland Asian populations. This showed that they are most related to NE Indian tribals such as the Santhal and Ho (fig 11). Their accelerated drift due to the rather lengthy period of isolation, estimate above at about 7,000 to 10,000 years, and small population size makes them less than a perfect proxy for ASI in Asian populations. In other their true magnitude of relatedness to mainland Asians is somewhat masked by their drift. This can be seen when we use the 3-population test (f3) to compare their signal of admixture with Asians versus some of the other ASI proxies.

Fig 11- IBS comparison of Onge with various Asian populations using about 40K intersecting SNPs.

The 3-population (f3) test is integrated into ADMIXTOOLS, which is available at the Reich Lab website. It is a formal test of admixture and can provide evidence of admixture, even if the gene ﬂow events occurred hundreds of generations ago. Here the test f3(Target; ASI, Mbuti) was performed to rank various Eurasian populations according to the signal of geneflow from various ASI proxies from India and SE Asia. Since Mbuti which is an outgroup to Eurasian populations was used as one of the two donor populations, a negative f3 or Z value, which indicates admixture from the 2 donor populations into the target was not expected. The purpose of using an outgroup is to be able to attribute the strength of the f3 signal to the ASI donor only.

Fig 12 – Plots of 1 – [F3 ( Target; ASI, Mbuti )] using Indian and SE Asian tribals as ASI proxies. Expectedly, the signal using Onge as an ASI proxy was weaker than using other tribals, due to accelerated drift within the Onge. Here higher values indicate a higher signal of admixture.

Santhal, a NE Indian tribal, along with Burmese, Brunei_Muruts, and Onge were used as ASI proxies (fig 12). Generally, for West and South Asians, the highest signal of admixture was obtained using Santhal as the ASI donor. For substantially East Eurasian admixed populations, the highest signal of admixture was obtained using Brunei and Burmese tribals as the ASI donors (fig 13).

Fig 13 – Combined plot of 1 – [F3 ( Target; ASI, Mbuti )] using Indian and SE Asian tribals as ASI proxies to visualize the comparison. Expectedly, the signal using Onge as an ASI proxy was weaker than using other tribals, due to accelerated drift within the Onge. Here higher values indicate a higher signal of admixture.

Figures 12 & 13 show that West Asians have higher signals of ASI admixture than Europeans. Among West Asians, Kurds and Iranians had the highest signals of ASI admixture, although Zoroastrian from Iran had a considerably lower signal. The most likely factors contributing to less allele sharing between the Onge and various Eurasian populations as compared with the aforementioned ASI proxies, are a combination of exaggerated drift due to the Onge small population and relative isolation, as well as shielding from the multi-directional geneflows that must have taken place across Eurasia over the past 10,000 years.

REFERENCES:

Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, et al. Ancient admixture in human history. Genetics. 2012; 192: 1065–1093. https://doi.org/10.1534/genetics.112.145037 PMID: 22960212
PopG version 4.0, Ben Zawadzki, University of Washington
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ & Sham PC (2007)
PLINK: a toolset for whole-genome association and population-based linkage analysis. American Journal of Human Genetics, 81.