New blogs

Leherensuge was replaced in October 2010 by two new blogs: For what they were... we are and For what we are... they will be. Check them out.

Monday, March 22, 2010

East Asian autosomal DNA (working note)


This is a highly simplified (approximate, tentative, very rough) geographical interpretation of
the HUGO consortium autosomal DNA clustering (paywall but someone hang it HERE - look at the details and not just this poor map, a mere working note, before assuming too many things, please), which produces five major components for East Asians and Melanesians at K=14. The rest are minority components (represented as circles) or South/West Asian ones (not shown here).

Continuous lines show the approximate areas with 50% or more of that component, dotted lines the areas (also approximate) with 20% or more.



Only three of the main components appear as majoritary in some populations: the yellow component, which reaches its highest frequency among Ryukyuans, the light green component which can be described as "Austronesian" but that is also important among Tai-Kadai and Southern Han and the dark green component, which is highest among Boungaville Melanesians and then in Eastern Lesser Sunda (Alor) and could be described as "Melanesian".

The red component is highest among Austroasiatic speakers, as well as Malaysian Malays, including Sea Dayaks (but not Proto-Malays nor Orang Asli), Javanese and Sundanese. The blue component is widespread among continental East Asian peoples (and Hymalayas) but shows no area nor ethnicity where is most concetrated.

Minor components (big dots) correspond to the Hmong-Mien (cyan, which share the blue and light green components too), the Mlabri (light purple, a tiny Austroasiatic hunter-gatherer group), Orang-Asli Negritos (dark red), Proto-Malay (purple-blue), Land Dayaks (grey here but white in the paper) and Filipino Negritos (dark purple).

Related posts:
- East Asians originated in SE Asia
- Indonesian Y-DNA is mostly Paleolithic
- Genetics of the Mlabri, Austroasiatic hunter-gatherers in Thailand

See also the supplementary materials: even more details!

13 comments:

Manju Edangam said...

Looks like northern route. northern East Asia -> South East Asia -> South Asia -> West Asia/Europe.

SE Asians appear to have more components than South Asians.

Maju said...

Can't look like a northern route: the paper itself clearly states a southern route. Have you read the paper?

The graph clearly shows also a distinct unique branch for East Asians and Melanesians, much more diverse in the south than in the north and with two main branches: ISEA and rest of East Asia (this one branching into Austroasiatics and the rest, and so on).

"SE Asians appear to have more components than South Asians".

That may be because of sampling strategy (comparatively few SAs). Also these components may reflect processes of homogenization that were necesarily more intense in the smaller region that is SA. They can't reflect ultimate ancestry but can represent Late Paleolithic/Early Neolithic status.

Manju Edangam said...

Can't look like a northern route: the paper itself clearly states a southern route. Have you read the paper?
They have just parroted it.

That may be because of sampling strategy (comparatively few SAs).
Sample sizes are similar. In fact, Aeta with lesser numbers show more components.

Forget about everything you know about human migrations. If you look at the map then what would you say about Eurasian population.

Maju said...

"They have just parroted it".

No, please! The HUGO paper was presented precisely as the confirmation of the southern route and the ML tree is unequivocal in this aspect.

"Sample sizes are similar. In fact, Aeta with lesser numbers show more components".

But only one specific component.

Anyhow, you are aritrarily deciding that K-means clustering weights more than the maximum likelihood tree, which is absolutely clear about East Asians being a subset of Eurasians, with South Asians being a different subset.

This should be much more meaningful than the K-means analysis with a clear oversampling of East Asian peoples. The fact that Aetas (a tiny population) and South Asians (1/5 of Humankind) have samples that are similar in size only shows how oversampled are East Asians.

The K-means analysis anyhow can't identify the Middle Paleolithic events but only a more recent time frame of some 10 or, at best, 20 Ka. That's what you see when you run structure on Europeans and is anyhow very sensible to sampling strategies.

"Forget about everything you know about human migrations. If you look at the map then what would you say about Eurasian population".

The map I created myself based on the K-means analysis. I ignored the West and South Asian components altogether. It only applies to East Asia, even if the blue component permeates into Hymalayan India, a well known Neolithic process.

Still, I see 9 components in SE Asia and only one that is NE Asian (the blue component is in both subregions). That means that there is brutally more diversity in SEA than in NEA, what implies a Southern origin for all.

Maju said...

This greater SEA diversity is confirmed in all PC analysis, where NEA peoples (described as "Altaic" and "Han Chinese" always occupy a small cozy cluster of the scatterplot, no matter how many groups are removed.

While the PC analysis is also hypersensible to sample sizes, there should be no doubt after looking at all the data that SEA is much more diverse than NEA.

Manju Edangam said...

That means that there is brutally more diversity in SEA than in NEA, what implies a Southern origin for all.

Agreed.

As far as I know Indian groups are not homogeneous but show high degree of substructure. But the way I understood ML graph, all the diversity is confined to India. This is not the case with SE Asian groups where different components show varying degree of presence all over Eurasia and Oceania. That mean Homo Sapiens must have taken inhospitable northern route and progressed quickly to tropical and hospitable SE Asia. There the population might have been exploded resulting back migration to northern Asia and southern Asia. Had it been southern route, SEAsian diversity would have been observed in India with its hospitable and tropical weather.

Maju said...

The way I see it, while not a perfect representation, what is clear is that, as you follow the main line from the root (bottom), it splits up in two branches: South Asians and East Asians/Melanesians.

However, due to the shape they chose for the ML tree (artistic license and also representative convenience because of the associated K-means graph) it may look as a single line, from where several branches arise at different moments, when it fact each branch is a bifurcation of the whole tree.

There are "anomalies" but these are typical of admixed populations.


The maybe rarer case is that of CEU but I imagine it happens because of their "purity" in terms of other components.

Check the the supplemental material, anyhow for more details.

At K=2 we see two populations: Africans+Europeans vs. East Asians (Indians would be intermediate), however, as East Asians are oversampled this is rather normal.

At K=3 we see three subpopulations: Africans, "Euro-Indians" and East Asians, which is a common sense division.

Then we start seeing the various East Asian clusters (and eventually the other South Asian one too): An ISEA/mainland EA divide appears already at K=4, just like the ML tree, etc. The Mlabri appear as clearly distinct only at K=8, and the Indians only show a cluster distinct from Europeans at K=14 (and never in the second set of K rounds, though I'm not sure of the difference between both).

In figures S34, S35 and S36 you can see different ML trees in which CEU cluster with Indians. Europeans cluster best in any case with IN-IL (Upper Caste Hindi from southern Rajastan) and less closely with IN-NL (Upper Caste Hindi from Uttarancachal, I think).

In most versions, the Eurasian branch splits in two first of all: South-West Eurasians and East Asians-Melanesians. In others though it makes some sort of "matriushka doll" with East Asian being the doll deep inside (i.e. a subset of a subset of a subset...).

Maju said...

Also when the ML tree variants make a "Russian doll" shape, East Asians are a subcluster of an Indian subset, which is:

·S34-C: IN-EL, IN-SP, IN-WI, IN-DR
·S35-B: IN-WI, IN-DR
·S35-C: IN-WI
·S36-B: IN-EL, IN-SP, IN-WI, IN-DR (again)

IN-WI, are the Bhil tribe, sampled in West-Central India, IN-DR are Upper Caste Dravidian Telugus, IN-EL are Upper Caste Bengali and IN-SP are Upper Caste Hindi from Uttar Pradesh.

So it would seem that East Asians are either a distinct Eurasian branch or a subset (an ancient one indeed) of these Indian populations.

terryt said...

"That mean Homo Sapiens must have taken inhospitable northern route and progressed quickly to tropical and hospitable SE Asia. There the population might have been exploded resulting back migration to northern Asia and southern Asia".

I agree with you but you're wasting your time here Manju.

Maju said...

But you sustain that hypothesis for no reason at all, Terry: just because it fits with your stubborn preconceptions.

What we see in the genetic trail is that some time after L3, a descendant, M literally explodes in a huge firework of nothing less than 40-plus basal lineages. And that happened already in Asia (and by the diversity rule in South Asia). I understand that a northern route through Siberia cannot explain such phenomenon at all.

Between East Africa and India there area huge deserts but South Asia must have been a dream destiny for those survivalists, our ancestor. So when they arrived there, they thrived and that is precisely what the genetic record tells us.

terryt said...

"But you sustain that hypothesis for no reason at all, Terry: just because it fits with your stubborn preconceptions".

I sustain that hypothesis because it most parsimoniously explains the distribution of modern haplogroups, both male and female.

"I understand that a northern route through Siberia cannot explain such phenomenon at all".

That's because mtDNA M and Y-hap F's expansion has nothing at all to do with a northern route. Try explaining mtDNA N's distribution as being southern. Of Y-hap C's.

"Between East Africa and India there area huge deserts"

Especially through Southern Arabia. However it's quite likely that an expansion through Syria and the headwaters of the Tigris/Euphrates, for example, would be much easier.

Maju said...

"I sustain that hypothesis because it most parsimoniously explains the distribution of modern haplogroups, both male and female".

Write something coherent on it and publish in your own site. Because I don't see how can you come to such conclusions. The Northern haplogroups are in all cases few and recent.

"That's because mtDNA M and Y-hap F's expansion has nothing at all to do with a northern route".

mtDNA M and Y-hap F's expansion is the main phenomenon happening in Eurasia early on and still today most non-Africans belong to these lineages.

But mtDNA N and Y-DNA C and D also show the same southern origin, just that further east.

"Try explaining mtDNA N's distribution as being southern. Or Y-hap C's".

I have done it more than once. The center of gravity of these two lineages is in SEA and they have way too many ISEA/Sahul lineages to be from further North.

Maju said...

FYI, I have posted a more elaborate map, along with further observations and even theories HERE.