New blogs

Leherensuge was replaced in October 2010 by two new blogs: For what they were... we are and For what we are... they will be. Check them out.

Wednesday, May 6, 2009

Mitochondrial DNA structure radically challenged


Found at Dienekes. Orignal paper: G. Alexe et al. PCA and Clustering Reveal Alternate mtDNA Phylogeny of N and M Clades (DOI: 10.1007/s00239-008-9148-7) - behind paywall.

This paper, if it actually has any substance behind, threatens to challenge our understanding of the Eurasian mtDNA tree and maybe even our certainties about what is a phylogenetically meaningful mutation.

Abstract:

Phylogenetic trees based on mtDNA polymorphisms are often used to infer the history of recent human migrations. However, there is no consensus on which method to use. Most methods make strong assumptions which may bias the choice of polymorphisms and result in computational complexity which limits the analysis to a few samples/polymorphisms. For example, parsimony minimizes the number of mutations, which biases the results to minimizing homoplasy events. Such biases may miss the global structure of the polymorphisms altogether, with the risk of identifying a “common” polymorphism as ancient without an internal check on whether it either is homoplasic or is identified as ancient because of sampling bias (from oversampling the population with the polymorphism). A signature of this problem is that different methods applied to the same data or the same method applied to different datasets results in different tree topologies. When the results of such analyses are combined, the consensus trees have a low internal branch consensus. We determine human mtDNA phylogeny from 1737 complete sequences using a new, direct method based on principal component analysis (PCA) and unsupervised consensus ensemble clustering. PCA identifies polymorphisms representing robust variations in the data and consensus ensemble clustering creates stable haplogroup clusters. The tree is obtained from the bifurcating network obtained when the data are split into k = 2,3,4,…,k max clusters, with equal sampling from each haplogroup. Our method assumes only that the data can be clustered into groups based on mutations, is fast, is stable to sample perturbation, uses all significant polymorphisms in the data, works for arbitrary sample sizes, and avoids sample choice and haplogroup size bias. The internal branches of our tree have a 90% consensus accuracy. In conclusion, our tree recreates the standard phylogeny of the N, M, L0/L1, L2, and L3 clades, confirming the African origin of modern humans and showing that the M and N clades arose in almost coincident migrations. However, the N clade haplogroups split along an East-West geographic divide, with a “European R clade” containing the haplogroups H, V, H/V, J, T, and U and a “Eurasian N subclade” including haplogroups B, R5, F, A, N9, I, W, and X. The haplogroup pairs (N9a, N9b) and (M7a, M7b) within N and M are placed in nonnearest locations in agreement with their expected large TMRCA from studies of their migrations into Japan. For comparison, we also construct consensus maximum likelihood, parsimony, neighbor joining, and UPGMA-based trees using the same polymorphisms and show that these methods give consistent results only for the clade tree. For recent branches, the consensus accuracy for these methods is in the range of 1–20%. From a comparison of our haplogroups to two chimp and one bonobo sequences, and assuming a chimp-human coalescent time of 5 million years before present, we find a human mtDNA TMRCA of 206,000 ± 14,000 years before present.

I don't have access to the full paper so I can hardly add much more by the moment. Just that this paper basically scraps R as we know it and makes it merely a West Eurasian haplogroup (called "European" in the abstract, no mention of the many South Asian R lineages), placing the Eastern branches of R (notably B, F and P) directly under N.

I can only guess that this article will cause many discussions all around. I will try to follow them and get a more informed opinion. I have just got a copy of the paper and I'm at the moment voraciously reading it.
.

5 comments:

Kepler said...

Maju, apologies, off topic: do you know perhaps how far I can get in finding out something more about which wave brought my male haplogroup ancestor into Spain, which is J2?

Like: if I get my clade, is there any way nowadays of determining whether it was most likely a Neolithic group or later? I suppose not or hardly much.

I have so far just the 12 markers from Genographic Project, there is not a single full match in ysearch, there are like 30 with one mutation difference (several from Italy), not a single one coming from Spain, in another database a full 12-match with one of - propably - Portuguese origin (Banda or the like). My family name is Spanish, I suppose sometime between 1498 and the XIX century a Spaniard came to Venezuela.

Ps. the article seems interesting indeed. I wonder when we are going to have some more stable ground even at that level of genetics

Maju said...

I assume you ask for Y-DNA, right? Because there's also mtDNA J2.

All I know is that there are two main clades: J2a and J2b and that they split soon after J1 and J2 did. J2b is curious because it is rather rare in West Asia and is most concentrated around it: both in Europe (specially SE Europe) and South Asia. J2 is in any case somewhat common in all the Iberian peninsula with a quite uniform distribution (J1 instead is concentrated around Granada and is instead much more common than J2 in North Africa). For these reasons I think that J2b and G2 (that might have a similar timeline too) may have older than Neolithic origins in Europe. J2a can be Neolithic in Europe, though and I guess that has more chances of having Jewish origin for instance.

Anyhow, Portuguese and Castilian surnames are often similar and have been historically modified easly (Soares and Suárez, Esteves and Estévez, etc.)

My opinion of DNA testing is that you can only find so much. Sometimes it may be a surprising finding but most commonly they can direct you to a world region and that's about all. Also notice that there is relatively low participation among non-Anglos in such projects, so your matches may not show up because they are just not taking the tests.

Maju said...

I wonder when we are going to have some more stable ground even at that level of genetics.

Science is that way: what you "know" today may be challenged tomorrow. Certainty, specially Absolute certainty, is a rare luxury, more proper of religious fanatics than of reational people.

Kepler said...

Eskerrik asko.

I know one can learn only so much and one haplogroup shows just a bit about the origins of one walk through the graph, but it is still fascinating for everyone.

I thought for most Venezuelans (like for me) it is particularly interesting: chances of getting this or that major "origin" region (and even continent) are big (even if there are the usual clear tendencies on the male or female side, in either case there are other good possibilities).
Also: we get to know more or less from the haplogroups where two of our (many) ancestors were in 1498 (first time Europeans arrived in America).

I agree with you the closest matches are to be taken with a pinch of salt. In my case I have seen quite some 11-foci-matches who are Italians, but also people from Germany and Eastern Europe (with references to "Jew" background) and one from Lebanon and Syria and even one Norwegian case and one Brit.
That just means they may be related like 2000 years ago.
In any case, the language AND the money make for a big bias.
If the database were from the general population in each country I would be getting more from Spain and the whole Mediterranean.

For me it is curious the genetic differences are not bigger for the Basques, their language being what it is, I would have expected even a V-haplogroup :-)
Still, when could the blood type differences have appeared? Blood is another matter and pretty vague tool (if at all) for backtracking populations, but perhaps it tells something about the Basques. Doesn't it?

Maju said...

Let's see: Basques, Gascons, Irish, Welsh, Scotts, Cornish and Bretons show a homogeneous pattern in relation to other European populations having both greater Rh- and greater Y-DNA R1b. This is generally considered as a persistence of what was once much more common throught Europe and specially Western Europe.

In other words: these Atlantic peoples are significatively less mixed with the post-Neolithic immigrants from Eastern Europe, West/Central Asia and North Africa.

You should not expect to find anything exclussive of them but just greater levels of what is older: of the substrate elements that now are somewhat more diluted in other parts of Western, Central and Northern Europe. In American "race speech": we are the "natives" and they are the "mestizos", so to say.

Anyhow, back to your main question: many Ashkenazi Jews may have Iberian ancestors and also both Ashkenazi and Sephardic Jews may have sprung from a common origin in Anatolia - with whatever other elements they have picked along their migrations and proselytist activities. Of course many, possibly most, of the Jews and Muslims who lived in Iberia in 1492-93 reluctantly converted and assimilated, not without difficulties. Muslims were surely most of native Iberian ancestry but Jews maybe not so much, at least by paternal lineage. As you may know the persecution against fake converts (despectively called "marranos") continued for some time at the hands of the Holy Inquisition, notably under that infamous half-Jew: Tomás de Torquemada, so many chose to emigrate to America, where they surely found better opportunities.

Still the connection, as you say, may be older and much more diffuse. Probably Jews are just highly oversampled, while other J2 carriers, mostly Mediterranean peoples, are instead undersampled. Do you know if your lineage is J2a or J2b? I'd say that J2b has much better chances of being ancient European, while J2a instead looks Neolithic/post-Neolithic to me.