New blogs

Leherensuge was replaced in October 2010 by two new blogs: For what they were... we are and For what we are... they will be. Check them out.

Thursday, July 22, 2010

New paper on human autosomal phylogenetics


A reader points me to a new quite interesting paper on human phylogeny from the viewpoint of autosomal DNA mainly.

Jinchuang Xing et al. Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping. Genomics 2010. Pay per view.

A copy can be found at ZohoViewer and the supplementary material is also freely available.

Abstract

High-throughput genotyping data are useful for making inferences about human evolutionary history. However, the populations sampled to date are unevenly distributed, and some areas (e.g., South and Central Asia) have rarely been sampled in large-scale studies. To assess human genetic variation more evenly, we sampled 296 individuals from 13 worldwide populations that are not covered by previous studies. By combining these samples with a data set from our laboratory and the HapMap II samples, we assembled a final dataset of ~ 250,000 SNPs in 850 individuals from 40 populations. With more uniform sampling, the estimate of global genetic differentiation (FST) substantially decreases from ~ 16% with the HapMap II samples to ~ 11%. A panel of copy number variations typed in the same populations shows patterns of diversity similar to the SNP data, with highest diversity in African populations. This unique sample collection also permits new inferences about human evolutionary history. The comparison of haplotype variation among populations supports a single out-of-Africa migration event and suggests that the founding population of Eurasia may have been relatively large but isolated from Africans for a period of time. We also found a substantial affinity between populations from central Asia (Kyrgyzstani and Mongolian Buryat) and America, suggesting a central Asian contribution to New World founder populations.


Fig. 3 click to expand

The abstract already addresses which are the most important conclusions of the paper: (1) lower genetic distances with better sampling strategies, (2) claim of large distinct founder population at the origins of the Out of Africa migration and (3) claim of greater affinity of Native Americans with Central Asians than with East Asians senso stricto. Additionally they also emphasize (4) the finding that the West Eurasian component in South Asians is of West Asian origin rather than European.

In the graphs I have noticed a couple of other details worth of mention: (5) that Pygmies appear more distinct than Khoisan from the bulk of the species (which is somewhat contradictory with the haploid phylogeny) and (6) that the closest African populations to Eurasians are "Nilotic" groups of the Kenya-Uganda-Ituri area (neither the Horn of Africa nor the Nile Basin were sampled).

I will address some of these matters now.


The migrant Out of Africa population

The authors take some time to address the issue of the migrant population in pages 20-21:

The OoA hypothesis, proposing a single OoA bottleneck followed by an expansion into Eurasia approximately 50,000 years ago, has gained extensive support from the archaeological record and genetic studies. Nevertheless, many of the historical details of this diaspora remain unclear. A common interpretation is that the OoA bottleneck was the result of a migration of a small founding population into Eurasia. Given the difference in haplotype heterozygosity between African and non-African populations and the relationship between heterozygosity and effective population size, we can estimate the effective population size of such a founding population . Within Africa, the average 100-kb haplotype heterozygosity in our data is 0.91. Immediately outside of Africa in Europe, the Middle East, and Central Asia, the average haplotype heterozygosity is 0.82 (Figure 2). A reduction of heterozygosity from 0.91 to 0.82 in a one-generation bottleneck would require an effective population size of only 5.5 individuals. While a one-generation bottleneck is an oversimplification, these estimates indicate that an OoA bottleneck resulting from the migration of a small founding population would require an extremely small population size. However, given that the archaeological record indicates a rapid expansion of modern humans into Europe and Asia in just a few thousand years , it seems unlikely that Eurasia could be populated so quickly by a such a small founding population.

A more likely explanation for the OoA bottleneck is that Eurasia was populated by a larger population that had been relatively isolated from other modern human populations for tens of thousands of years prior to the expansion. The first fossil evidence for modern humans outside of Africa is in the Middle East at Skhul and Qafzeh between 80,000-100,000 years ago, which is at least 20,000 years prior to the Eurasian diaspora. If a population of modern humans remained in the Middle East until the expansion into Eurasia, there would have been sufficient time for genetic drift to reduce heterozygosity dramatically before the Eurasia expansion. This “Middle East isolation” hypothesis provides a robust explanation for the relative homogeneity of European and Asian populations relative to African populations (see Figures 3A-B) and is supported by a recent maximum likelihood estimate of 140,000 years ago for the time of Eurasian-West African population separation. Interestingly, a recent study of the Neandertal genome suggests that the non-African individuals, but not the Africans, contain similar amount of admixture (1-4%) with the Neandertals. The authors suggest that the admixture must have happened between the Neandertals with an ancestral non-African population before the Eurasian expansion. Given the fossil, archaeological, and genetic evidence, the Middle East isolation hypothesis warrants rigorous evaluation as whole-genome sequence data become available.

I must say that the real problem is to be talking of a mere depth of 50,000 years for H. sapiens colonization of Eurasia, when that must be the date of the reflux into West Eurasia. The archaeological record for Asia east of Iran is inconclusive (too poor) and the genetic data, including the one available here, strongly suggests that South and East Asia were colonized before West Eurasia.

Hence we must be talking of a quite greater time depth such as the 75-80,000 years ago or more, as has been suggested by most population genetic analysis as of late. Certainly nothing less than 60,000 years ago minimum.

The assumption the authors make is therefore wrong so it's likely that the conclusion is also wrong.

That doesn't mean that the considerations they make, specially those regarding a very small colonizer population do not make sense. This small group of adventurous colonists could perfectly have colonized Asia with much more time, leaving very few remains precisely because they were few and even when they grew up in numbers they were still not many. The relatively poor situation of Asian archaeology does not help to unravel the case in either direction but we must remember that the Jawalpurram remains have clear African MSA affinities (and hence are likely to be product of our species) and these date from before the Toba event, which could well have also helped in the reduction of Eurasian heterozygosity even more, some 74,000 years ago. And there are other archaeological clues that, while not clearly conclusive, may suggest an expansion into Asia since as early as c. 110,000 years ago.

Sure, it would be also a good idea to ponder carefully about the role of the Middle Paleolithic colonists of Palestine in the whole process if that is possible. I have nothing against that but I still don't like their reasoning in this point.


The branching out of Eurasians and the two South Asian components

The neighbor joining trees (see fig. 3 above and also fig. S1 at the supplemental materials, very similar) are one of the most interesting results of this paper and the authors are clearly proud of them.

I am going to ignore this "detail" hereafter but I must however mention that the tree produced in fig. S2, after the inclusion of a North African and two Palestinian populations is however very different. This is strange but I don't know how to handle this discrepancy. It might be a point of support for their hypothesis of a long separate coalescence in the Levant? Can't say.

The two other trees however really produce a result that is an almost perfect fit with haploid phylogenies, with Eurasians branching in two in Tropical Asia (South and East Asian branches) first of all.

Then the South/West Eurasian branch shows a division between South Indians and the rest, what I interpret as a split happening still in South Asia prior to the colonization of West Eurasia. Then Pakistanis and West Eurasians branch apart and then the same happens with Europeans diverging from the West Asian/Caucasus population.

Some of the branches' positions however may be caused by ulterior admixture so let's be careful with that.

The authors also emphasize the finding (consistent with what we have seen in other papers) that the second South Asian component, related to West Eurasians, is essentially of West Asian/Caucasus affinity and not European.

I agree with this and I think that it is an important point to make. It seems to imply that an important genetic flow has existed from West Asia into South Asia, specially the Northwest part of it. Of course the flow may have happened at different historical and prehistorical periods but it is important to realize that the Neolithic Age was surely when such migrations might have caused a greater impact.

In contrast some of "European" (darker orange) component is also visible, maybe originating in the Indoeuropean flows and maybe replaceable by a more specific Central Asian component (sadly Central Asia and Siberia is only sparsely sampled in this paper) if the findings of Hui Li are to be reproduced in the context of proper sampling strategies in this delicate area. Whatever the case the European input in South Asia is very minor, even if maybe slightly larger than among West Asians/Caucasians. We can safely infer, I understand, that it reflects the real Indoeuropean genetic input via Central Asia.

Most importantly a clearly distinct South Asian component (purple) has been detected and is strong enough to make up 50% of the Pakistani gene pool and almost the totality of some South Indian populations. Also notice the distinctive Irula component (blue), which may reflect the particular long isolation of these tribals, in the past tentatively classified as "Negritos".

Notice also the minor but significant presence of the Indian component in SE Asia, specially in Thailand. I have on occasion noticed that some Thais seem to have a distinctive phenotype and maybe this is the explanation.


East Asians and Native Americans

In this aspect I want to say that I am not totally persuaded by the authors' claim of greater Central Asian affinity of Native Americans. The main reason is that the "Central Asians" they mention such as Nepalese or Kyrgyzes are possibly admixed populations that owe their position in the NJ tree to that fact.

Even the Buryats appear to show some of that admixture. In this case (and maybe in the others too) it is probably a case of Central Asian specific components indeed but components that still may reflect a very ancient admixture event in the early Upper Paleolithic process of colonization of Central Asia and the Far North.

This is a limitation of this paper: they do some chest beating about a very throughout sampling (somewhat justified indeed) but in the case of Central Asia/Siberia they are lacking and the matter seems to be left unclear.

In any case, Native Americans or rather their ancestral founder population does look like having coalesced in a complex Central Asian and Siberian sparsely populated ancient landscape prior to their arrival to Beringia and subsequent colonization of America. Haploid genetics is very strongly supportive of such scenario.

It is difficult to ascertain however whether their high divergent location in the NJ tree, in the context of the East Asian branch, owes to them having diverged very early or rather (as I suspect) to their early admixture event, maybe partly shared with Central Asians and Siberians. We would need a much improved sampling strategy in those areas to be able to get some clear ideas.

Otherwise East Asians appear to show a first division between NE Asians and SE Asians, with the divide running across China. Not much more can be said, as the sample has not sufficient coverage, specially in SE Asia and Oceania.


African curiosities

One of the details of the trees that called my attention is that, in contrast to what happens in simplified haploid genetics, Pygmies are more distant from the rest of Humankind than Khoisan. This has an explanation, I believe, as the lineages more tightly associated with the Khoisan such as mtDNA L0 and Y-DNA A have representatives in NW Africa and even Arabia, indicating a protracted divergence (or repeated re-convergence) between the southern proto-Khoisanid branch and the main proto-Afro-Eurasian one. Instead when proto-Pygmies diverged they probably did for good, in spite of recent admixture with Bantus and some ancient lineages also shared with West Africans at minority levels.

Another such detail is that the populations most closely related to Eurasians are East Africans (Hema, Luhya, Alur). Overall the African branching process is coherent with the scenario I described here at Leherensuge some months ago.

11 comments:

Manju Edangam said...

Most importantly a clearly distinct South Asian component (purple) has been detected and is strong enough to make up 50% of the Pakistani gene pool and almost the totality of some South Indian populations

Very revealing. Isn't it? Matches with so-called South Asia specific mtDNA in that region. Autosomal analysis is not really helpful. It's a red-herring. We should concentrate on uniparental lineages only to understand linguistic distribution. As I have argued the local matrilineages establish themselves in overall genome over time even when patrilineages keep chaging.

I suppose Newar has only ~25% East Asian patrilineages(compared ~70% South Asian patrilineages). But their matrilineages are predominantly East Asia thus overall they are closer to East Asian.

Maju said...

"Very revealing. Isn't it? Matches with so-called South Asia specific mtDNA in that region".

Yes, roughly it does. However so far papers had mostly failed to detect this South Asian specificity.

"Autosomal analysis is not really helpful. It's a red-herring".

It's complementary but needs even more careful work probably. This paper achieves some of that desired level and that's why it is interesting.

You would get lost in the many many South Asian specific mtDNA lineages. You may want to group them by geography but that's artificial and may perfectly be a red herring itself. Y-DNA is even worse in producing clear results, although maybe for the opposite reason.

"We should concentrate on uniparental lineages only to understand linguistic distribution".

Associating linguistic distribution with genetics is the red herring. Even if in a few cases there may be a correlation, in most cases it has no relation whatsoever. After all genes in most cases scattered around dozens of millennia ago, while detectable linguistic families have at most 10,000 years, often much less, and have spread by means of elite dominance with very limited genetic flow associated.

"As I have argued the local matrilineages establish themselves in overall genome over time even when patrilineages keep chaging".

While this is possible to some extent, I don't think it is the most common case. IMO Y-DNA also has more stability and time depth that has been claimed often.

For example compare this Y-DNA reconstruction with this mtDNA one. The paternal and maternal lineages criss-cross somewhat but they do keep some correlation too.

If you are thinking in R1a, I believe that a lot still has to be explained of its substructure and that its origins are in South Asia, even if some may have back-migrated from Central Asia (and even Europe maybe).

In the deep view, Y-DNA F correlates with M, while Y-DNA C and D do with N. However, after Y-DNA C backmigrated to South Asia bringing some mtDNA N with "him", mtDNA R (and a few other lesser N sublineages probably) became associated with Y-DNA IJK and specially MNOPS.

They do criss-cross but you can still track the overall pattern. At least I think I can.

"I suppose Newar has only ~25% East Asian patrilineages(compared ~70% South Asian patrilineages). But their matrilineages are predominantly East Asia thus overall they are closer to East Asian".

I think this kind of rule of thumb is correct in most cases (i.e. mtDNA teds to reflect better the overall ancestry) but the issue must be addressed case by case, looking at all aspects to be sure.

Manju Edangam said...

I don't think it is the most common case. IMO Y-DNA also has more stability and time depth that has been claimed often.
I suppose you are thinking mathematically and I am thinking maybe positive selection. I mean all those unique SNPs that identify with particular regions could only be stable in stagnant females of that region.

Maju said...

I am upfront skeptic about "positive selection", as I rather believe in dynamic equilibrium in which different variants are similarly fit, which is fully compatible with neutrality expectations. Evolution tends to diversity, except where there is tight evolutionary constraints (negative or purifying selection), there are very few things that are clearly "positive", specially as the pre-existent variant is already optimally fit.

First demonstrate "positive selection" in the appropriate cases, please.

Manju Edangam said...

Okay, neither positive nor negative but neutral genotypes resulting in either positive or negative phenotypes. How about propensity to heart problems? How about low tolerance for alcohol? How about thinner retina?

What do you call the traits on which natural selection doesn't bother to act as they don't really affect reproduction but still sweep large chunk of population?

Maju said...

"How about propensity to heart problems?"

Looks like a problem of statistical insignificance. But you tell me of the details because I don't follow much health-related genetics, as I think they are a mere pretext to attract funds in most cases. There might be some cases where there is a really critical difference between allele A and allele B but most are just hyped, the word "propensity" says it all to me: just a statistical risk factor as so many others.

"How about low tolerance for alcohol?"

I think this is quite trivial. How can alcohol matter to Paleolithic survival?

"How about thinner retina?"

Provide me with more details, maybe a link, please, because I have no idea of what does this mean in practical terms or if it is related at all with haploid lineages and how. It may well be just another statistical fluke.

It's like that AIDS gel they are talking about in the scientific media this week: I don't care if it provides a smaller 33% chance because that only means that a woman will take one or two relations more to get the virus, so in the end it's absolutely useless and a case of statistics-backed misinformation.

"What do you call the traits on which natural selection doesn't bother to act as they don't really affect reproduction but still sweep large chunk of population?"

I don't know what you are talking about in precise terms. Survival, at least till the 40s, affects reproduction but what is what you mean specifically? Is there any clear case of haploid SNP that causes such differences or is it a mere statistical likely accident? Most SNPs, haploid or not that have been hyped by the genetic media and blogs, in the end happen to be mostly irrelevant, a small influence in a large complex chain of effects.

Of all "known" eye color alleles mentioned in SNPedia only one is clearly relevant and the rest are just vague references in the literature or media hype that was later dismantled (more silently). Sure, there are genes affecting such things but it may well happen that various combos result in various equivalent effects when it comes to overall fitness. Actually that's the most normal thing happening in reality.

Manju Edangam said...
This comment has been removed by the author.
Manju Edangam said...

These studies appeared in the Indian weeklies. However, I don't have any links at present.

By the way, it's not thinner retina but thinner cornea. I read that in relation to 'Lasik'. According to the article, the success of Lasik operation is not very good among Indians as most Indian have thinner cornea(or is it lens?) as compared to Europeans.

I don't know what you are talking about in precise terms.
How do you explain the spread of epicanthal fold?

Maju said...

Ok, I get now the idea about thinner cornea: it doesn't seem to have any fitness impact "in the wild", does it?

"How do you explain the spread of epicanthal fold?"

I get now what you mean. That's random: founder effects and such. At least that's how I see it.

Manju Edangam said...

I get now what you mean. That's random: founder effects and such. At least that's how I see it.

According this study neutral evolution is possible. I would suggest that has something to do with geography and with stagnant females in that region.

Maju said...

Sure, neutral evolution happens. But alone does not cause fixation: it needs of strong drift or founder effects (or both).

"I would suggest that has something to do with geography and with stagnant females in that region".

I'm not sure I understand your concept of "stagnant females".

I do understand that what I'd rather call "consolidated local lineages" imply demic continuity, at least to some extent. This may imply remote founder effects and drift, of course.

But there are many many basal female lineages in India, for instance, what to me means that the area was colonized early on with great diversification of these lineages. Instead an area like Europe, where the lineages are of a lower tier (more derived in the phylogeny), was colonized obviously at a later moment. But since colonization the lineages may well have been similarly "stagnant" in both regions.