Genetics and Archaeogenetics of South Asia

There has been significant progress in genetic and archaeogenetic studies of the Indian Populations in the last five years (as of 2006); this has implications for the Indo-Aryan migration/invasion theory.

The genetic studies are ongoing with conflicting results: those that support an infusion of genetic material {Bamshad et al.(2001), Spencer Wells, Journey of Man(2002), Basu et al. (2003), Cordaux et al.(2004)}and those that don't {Kivisild et al.(2003), Sengupta et al.(2005), Sahoo et al.(2006)}. A final picture will emerge after critical and comparative analyses of these studies.

The studies that support infusion of Y-lineages in upper-caste populations in particular usually link this to the historical arrival of Indo-Aryans. The age of these Y-lineages in India coincided with the putative Aryan immigration period in their studies. Conflicting studies suggest that these "Indo-Aryan" lineages are in fact more diverse in lower caste and tribal populations even though their frequency is lower. Thus, mtDNA haplogroup U2i is dubbed "Western Eurasian" in Bamshad et al. study but "Eastern Eurasian (mostly India specific)" in Kivisild et al. study.

However, there are still doubts over autosomal admixture analysis. It is also suggested that Indian marital traditions may have an impact on the calculation of age of Indian Y-haplogroups.

mtDNA
The largest Indian MtDNA haplogroups are M, R and U. With the possible exception of haplogroup U, that is shared with Western Eurasian populations, they seem to be native to South Asia.

Haplogroup M, that comprises c. 60% of Indian MtDNA, is actually a macro-haplogroup with many subgroups still poorly studied; the South Asian clades of M are mostly different from the East Asian ones.

Virtually all modern Central Asian MtDNA M lineages seem to belong to the Eastern Eurasian (Mongolian) rather than the Indian subtypes of haplogroup M, which indicates that no large-scale migration from the present Turkic-speaking populations of Central Asia to India (and vice versa) could have occurred (Kivisild 2000).

Y chromosome
In 2004 paper Cordaux argues independent origins of Indian caste and tribal paternal lineages: “Thus, the quantitative comparison of an extensive dataset of Y chromosome haplogroups in both Indian caste and tribal groups, as well as nongenetic information, support a scenario of independent origins of Indian caste and tribal paternal lineages, with recent immigration of caste Y lineages and subsequent bidirectional gene flow between caste and tribal groups. This conclusion contrasts with the earlier suggestion that both Indian caste and tribal Y chromosomes largely derive from the same Pleistocene genetic heritage, with only limited recent gene flow from external sources. In contrast with the Y chromosome evidence, the mtDNA evidence suggests a common origin of tribal and caste groups. It is likely that most maternal lineages largely represent the original mtDNA gene pool of India, implying that caste maternal lineages mainly derive from local tribal ancestors.”

This supersedes the earlier work (Kivisild et al. 2003b; Cordeaux et al. 2003), which emphasizes that the combined results from mtDNA, Y-chromosome and autosomal markers suggest that "Indian tribal and caste populations derive largely from the same genetic heritage of Pleistocene southern and western Asians and have received limited gene flow from external regions since the Holocene" (Kivisild 2003b).

R1a1
The haplogroup R1a1 (M17) is often linked with the ancient Kurgan (Yamna - "ямная") culture and Proto-Indo-Europeans of Southern Russia/Ukraine, who supposedly migrated to Europe, Central Asia and India between 3000 and 1000 BC (Passarino et al. 2001; Quintana-Murci et al. 2001; Wells et al. 2001).

Alternatively, the high frequency of R1a1 found in several South Indian tribes including the Chenchu and the Badagas, together with a higher R1a1-associated STR diversity in India and Iran compared with Europe and Central Asia, has been taken as evidence for an origin of R1a1 (M17) in Southern or Western Asia (Kivisild 2003b). Stephen Oppenheimer, who reports upon the results of the Human Genome Diversity Project in his book "The Real Eve: Modern Man's Journey out of Africa", comments that, "For me and for Toomas Kivisild, South Asia is logically the ultimate origin of M17 and his ancestors; and sure enough we find highest rates and greatest diversity of the M17 line in Pakistan, India, and eastern Iran, and low rates in the Caucasus. M17 is not only more diverse in South Asia than in Central Asia but diversity characterizes its presence in isolated tribal groups in the south, thus undermining any theory of M17 as a marker of a 'male Aryan Invasion of India'" (p. 152). Oppenheimer further believes that it is highly suggestive that India is the origin of the Eurasian mtDNA haplogroups which he calls the "Eurasian Eves". According to Oppenheimer it is highly probable that nearly all human maternal lineages in Europe (and similarly in East Asia) descended from only four mtDNA lines that originated in South Asia 50,000-10,000 years ago.

Unfortunately, there is not enough data to make the final conclusion about the R1a1 origin. In order to do so, comparative study of R1a1 haplogroup diversity in Ukraine (and/or South/Central Russia), Pakistan and India populations (using the same (large) set of microsatellite markers) is necessary. So far, only one attempt of such study has been made by Passarino in 2001. This study employs the 49a, f/TaqI Y specific system and the set of seven microsatellite markers to compare diversity of R1a1 (M17, Eu19) haplogroup in 29 world populations (including Ukraine, Poland, and India). According to Passarino (2001) “the 49a, f Ht 11 displays a major diversification in East Europe with respect to the other areas. Actually, in East Europe, all the derivatives of the 49a, f Ht 11 were observed (9 vs 6 in the "Balkans," 4 in the "Middle East," 1 in India, and 2 in West Europe). Moreover, Ukraine presents at least twice as many derivatives as the other East European populations. These findings suggest that East Europe is the place where this lineage originated or started to expand, particularly in Ukraine, which also includes a refuge area during the LGM.” However, more extensive studies, including Kashmiri populations are necessary to make the reliable conclusions.

Kivisild in his 2003 paper compares diversity of R1a1 (M17) haplogroup in Indian, Pakistani, Iranian, Central Asian, Czech and Estonian populations. This study shows, that diversity of R1a1 in India (Pakistan, Iran) is higher, than in Czechs and Estonians. More than 1/3 of Y chromosome gene pool in Estonians is represented by “Uralic” N3 haplotype. (founder effect)

Some new data on R1a (defining mutation of R1a is SRY-1523 = SRY10831, preceding the M17 mutation which defines R1a1) diversity in Southeastern Europe (Croatia, Bosnia and Herzegovina, Serbia and Montenegro, and Macedonia) are represented in 2005 paper by Peričić et al. According to this paper, R1a haplotype shows high diversity in this area (especially in Bosnia and Herzegovina), “and the estimated range expansion at 15.8 ± 2.1 KYA, consistent with its deep Paleolithic time depth”.

Recent studies indicate that the haplogroups C5-M356, H-M69*, F*, L1 and R2 are indigenous to South Asia (Sengupta 2006: 211). According to Sengupta (2006), “our overall inference is that an early Holocene expansion in northwestern India (including the Indus Valley) contributed R1a1-M17 chromosomes both to the Central Asian and South Asian tribes prior to the arrival of the Indo-Europeans.”

A 2001 examination of male Y-DNA by Indian and American scientists indicated that higher castes are genetically closer to Western Eurasians than are individuals from lower castes, whose genetic profiles are similar to other Asians. According to Bamshad et al. (2001), higher caste Telugus have a higher frequency of haplogroup 3 (R1a1) than lower castes. Haplogroup 3 is also characteristic for the Eastern Europeans. In the study, Bamshad and his team wrote, "Our results demonstrate that for biparentally inherited autosomal markers, genetic distances between upper, middle, and lower castes are significantly correlated with rank; upper castes are more similar to Europeans than to Asians; and upper castes are significantly more similar to Europeans than are lower castes." There is some evidence that a few millennia ago, a group of people with (Eastern) European genetic affinities migrated into the Indian subcontinent from the northwest. In the abstract to their paper Bamshad et al stated, "In the most recent of these waves, Indo-European-speaking people from West Eurasia entered India from the northwest and diffused throughout the subcontinent. They purportedly admixed with or displaced indigenous Dravidic-speaking populations. Subsequently they may have established the Hindu caste system and placed themselves primarily in castes of higher rank". However, critics point out that a South Indian state of Andhra Pradesh might not be the best place for such a study. One of the upper castes, Kshatriyas (Rajus), belongs to the minuscule part of Telugu population. Also, historically South Indian royal families had marital relationship with Central and East Indian royal families. In other words, Kshatriyas were not as isolated as Chenchu tribe. In the regions of present day Andhra Pradesh, the dominant and generally feudal castes were Kapu,Reddys and Kammas though they were classified as Shudras. Also, terming Brahmins in South India as a proof of dominance of Indo-European people has beein questioned in regard the Brahmin migration to South India. From historical records it has been observed that the transition of South Indian kings from Buddhism, Jainism and Vedic Saivism and Brahmanical Hinduism resulted in Brahmins being imported from North India to perform religious duties. In addition to that, it has also been noted that many North Indian Brahmin families took refuge in South India escaping from religious persecution at the hands of invaders. Critics also point out that the European specific markers, however controversial might their origins be, is observed across the caste lines in North-West of India. The study also revealed another classic anthropological observation, that women are significantly more mobile in terms of caste and hierarchical class than men, who are barely socially mobile at all in terms of caste and hierarchical Social class. Genetic evidence reveals that over millennia men from higher casts have married women from lower castes, but women from higher casts have rarely married men from lower castes. Thus the researchers imply that caste and class to a large extent is perpetuated by women and has also thereby contributed to the minimal mixing of Aryan blood with the natives. Recent paper in Current Biology, [http://www.eva.mpg.de/genetics/pdf/CordauxCurBiol2004.pdf Cordaux et. al. (2004)] confirms the Bamshad (2001) results and concludes that the paternal lineages of Indian caste groups are primarily descendants of Indo-European speakers who migrated from Central Asia about 3,500 years ago.

However, other studies (Kivisild 2003a; Kivisild 2003b) have revealed that a high frequency of haplogroup 3 (R1a1) occurs in about half of the male population of Northwestern India and is also frequent in Western Bengal. These results, together with the fact that haplogroup 3 is much less frequent in Iran and Anatolia than it is in India, indicates that haplogroup 3 among high caste Telugus did not necessarily originate from Eastern Europeans. The high diversity of haplogroup 3 and 9 in India suggests that these haplogroups may have originated in India (Kivisild 2003a).

Other haplogroups
The neolithic spread of farmers to Europe from Levant/Middle East has also been linked to 12f2 (haplogroup J) and the markers M35 (haplogroup E3b) and M201 (haplogroup G). But while M35 (E3b) is present in Europe, Anatolia, South Caucasus and Iran. Indians generally do not have the Alu insertion in their Y chromosomes. The lack of YAP+ chromosomes (haplogroup E) in India suggests that M35 appeared in the Middle East only after a migration from Iran to India had taken place, but earlier than the later migration of Near and Middle Eastern farmers to Europe (Kivisild 2003a).

Most of the pro-migration papers imply that R1a1 is the genetic marker that is representative of a migration, due to its high frequency in Eurasia. But an equally likely genetic marker is haplogroup L. This haplogroup is present in Greek, Turkish, Lebanese, Iranian, Central Asian, and Indian populations (and Europe, see Kivisild). This marker is found in locations where written sources record the presence of Indo-European languages and people: i.e. Greeks, Hittite, Mitanni, Iranians and Indians. Its peak frequency is found in Indo-Iranian populations. The 'Western Eurasian' components that are found in Indian mtDNA show a distribution closer to that found in the Southern Caucasus and Middle East than to that found in Eastern Europe. There is also the question of why one should assume only one Y haplogroup is representative of the Aryan gene pool. R1a1, R1b, J2, L and H - all of which are present in India and Central and West Asia - are all possibilities. However, haplogroup L has a very low level of diversity in the Punjab. This is suggestive of a recent migration or expansion event in the area, and is supported by the fact that the diversity of R1a1, J2 and haplogroup C is higher in the region. Haplogroup C is supposed to be the remmants of the "Out of Africa" migration of humans, but still retains a high level of diversity. Haplogroup L is also found in South India at relatively high freqencies and has been associated by some (along with J2) with the spread of farming and Dravidian languages.

Interestingly, studies show that there has been very little mixing of the male lines between castes/clans for some time. They show distinct haplotypes even though many clans within a region have similar haplogroups. For instance, Northwest Indians exhibit mainly haplogroups R1a1, R1b, J2 and L, yet there is very little sharing of haplotypes with other castes/clans in the same region.

The J2 haplogroup is almost absent from tribals, but occurs among some Austro-Asiatic tribals (11%). The frequency of J2 is higher in South Indian castes (19%) than in North Indian castes (11%) or Pakistan (12%) (Sengupta 2006).

Autosomal markers
One more important marker for Caucasian ancestry in admixed populations may be taken into consideration: H2 haplotype of the gene MAPT. It is shown to be Caucasian in origin, and may work as a good estimator of European admixture. “The constancy of the H2 allele frequency in Caucasian populations from the Middle East to the Orkneys suggest that its origin in European populations is ancient and coincides with the colonization of Europe.”. MAPT represented “by two distinct lineages, H1 and H2, that have diverged for as much as 3 million years and show no evidence of having recombined”. “The H2 lineage is rare in Africans, almost absent in East Asians but found at a frequency of 20% in Europeans”. There are some “evidence suggesting that Homo neanderthalensis contributed the H2 MAPT haplotype to Homo sapiens”. H2 is found in many Pakistan populations.

Interestingly, map of the worldwide frequencies of ASPM (Brain Size Determinant in Homo sapiens) haplogroup D ("derived") matches surprisingly well the map of H2 haplotype distribution. “The frequency of haplogroup D chromosomes is ... 44% in Europeans and Middle Easterners”. “Estimated the coalescence age (i.e., time to the most recent common ancestor) of haplogroup D at 5800 years, with a 95% confidence interval between 500 and 14,100 years.” Of course one should take into consideration, that ASPM “haplogroup D ... rose to high frequency under strong positive selection”, thus Frequency of the ASPM haplogroup D is expected to be higher, than MAPT haplogroup H2. However, considering the facts that only few Pakistani populations were sampled and both markers (ASPM haplogroup D, MAPT haplogroup H2) are present not only in European, but in Middle Eastern populations too, one should consider distribution of these markers only as a suggestion of the eastward migration of “Caucasian peoples” (Europeans and/or Middle Easterners). Thus distribution of these markers taken alone can hardly prove specific Indo-Aryan migration or invasion.

Intriguingly, well-discussed CCR5 delta 32 mutation may be older, than suspected before, and was detected in 2900-year-old skeletal remains from different burial sites in central Germany and southern Italy with rather high allele frequency (11.9%). Thus this mutation may work as a marker of European (vs. Middle Eastern) ancestry. According the 2002 Khaliq paper frequency of the CCR5 delta 32 allele ranged from 0.62% to 3.57% in Pakistani ethnic groups, which is much lower than that found in European populations (10% average frequency), and similar to that in the Middle East. One of the possible explanations of such geographical distribution is the migration of the mutation carriers from the territory of high mutation frequency into the area where such mutation is absent.

South Asia and Central Asia
A recent study (Sengupta 2006) found that the “influence of Central Asia on the pre-existing gene pool was minor. The ages of accumulated microsatellite variation in the majority of Indian haplogroups exceed 10,000–15,000 years, which attests to the antiquity of regional differentiation.” and it concluded: “Our reappraisal indicates that pre-Holocene and Holocene-era—not Indo-European—expansions have shaped the distinctive South Asian Y-chromosome landscape.”

According to Sahoo (2006), “The sharing of some Y-chromosomal haplogroups between Indian and Central Asian populations is most parsimoniously explained by a deep, common ancestry between the two regions, with diffusion of some Indian-specific lineages northward. The Y-chromosomal data consistently suggest a largely South Asian origin for Indian caste communities and therefore argue against any major influx, from regions north and west of India, of people associated either with the development of agriculture or the spread of the Indo-Aryan language family.”