New blogs

Leherensuge was replaced in October 2010 by two new blogs: For what they were... we are and For what we are... they will be. Check them out.

Thursday, January 29, 2009

On mtDNA M and its subclades

Unlike haplogroup N, all major (and many minor) subclades of M stem directly from the M node. And they are many: 26 in total. This clearly suggests that M experienced a large expansion upon arrival to Asia, most likely to South Asia (based on diversity and geographical logic).

And then what?

Well, some subclades (like M8-CZ, M7, M4, etc.) have experienced multiple ramifications, sign of new expansions. But here I'm focusing on M and its direct descendants (by the moment - can't do all at the same time, especially with such complex data).

One thing I noticed is that after the N node, the different subclades show varied numbers of SNPs before further division. Some have just one SNP, while others have five or even as many as 11. While it's difficult to say for sure (there's randomness at play here) wether a 2 SNP clade has coalesced for longer than a 1 SNP lineage, when you compare these with "sister" clades that have five or eleven SNPs at the root, you can reach to some reasonable conclusions:

- Clades with "short" roots exapanded shortly after the M expansion, maybe even along with this one.
- Clades with "intermediate" roots (c. 5 SNPs) coalesced for some time before really finding their opportunity for expansion.
- Clades with "long" roots (c. 10 SNPs) lagged behind their "sisters" clearly.

Also the statistics on this matter can shed some light on the expansion of macro-haplogroup M as such.

So I made this graph:

What do I gather from this? Not sure but maybe it could be read as follows:

Early expansion:

Within the expansion of M, still mostly undifferentiated, some "daughter" clades were already active. These are primarily those with one or two SNPs at their roots. It would seem that:
  1. SE Asia was then colonized (M9-E, M21)
  2. New Guinea was reached (M27-29)
  3. East Asia proper was reached too (notably M7)
  4. Some derived expansion also happened within South Asia (notably M3 and M4)
The most important clades of this group in South and East Asia may appear to belong to a second moment (2 SNPs) but this is hard to discern.

Second expansion:

This second wave includes a large number of sublineages. By regions:
  1. South Asian lineages in this group are not particularly important (large) excepted M2. Not sure if we can consider M6 (also important) as belonging to this wave or if it represents a expansion on its own right, albeit limited to South Asia.
  2. Sahulian lineages instead are significative, including all the M found in Australia. It may suggest that Australia was colonized after New Guinea (at least by M descendants).
  3. The Andamanese subclades (color-coded as SE Asian), at least M31, also belong to this moment.
  4. East Asian M8-CZ, as well as G, expanded at this expansive peak as well.
Late clades:

The only really important late clade is M1, that found its way into West Asia and beyond long after the expansion into Eastern Eurasia had happened.

In brief:

It appears that the expansion of M might have happened in two somewhat distinct waves and that this expansion happened surely before migration into West Eurasia of M1. Both waves participated in the colonization of South and East Asia but the first wave is more strongly associated to the colonization of SE Asia and New Guinea, while the second is to that of Australia and Andaman islands instead. A third late minor wave expanded basically into Western Eurasia and Africa (M1).

Edit: Extending the count to all Eurasian clades?

This can be done for comparison purposes. After all, M did not expand alone with all likehood: N and R were also there.

Let's see: M is 4 SNPs derived from L3, while N is 5. This would place the N node right on the 1 SNP "peak" of the graph above. Most N-derived clades (macro-R, macro-X, macro-W and S - see my previous post on N) would fit on the 2 SNPs bar (what basically means that Australia was surely colonized in the first wave, albeit with a dominance of N subclades), while R would sit at the 3 SNPs bar, making the "pause" somewhat more dynamic and the whole expansion process somewhat more continuous.

Then what we could well call macro-F (including F, R5, R9b and an unnamed New Guinean lineage) would sit on the 4 SNPs bar, with F as such sitting on the 6 SNPs bar, wholly in the second wave of M expansion. B would also sit on the 4 SNPs bar. This suggests that SE Asia was also affected by the "second wave" but that R subclades were most important there in this occasion.

For a West Eurasian reference, the R0-HV node would sit on the 6 SNPs bar, while the H node would be on the 8 SNPs bar, in the aftermath of the second wave, and H3 on the 9SNPs bar. This may mean, indirectly, that M1 is just a late arrival in West Eurasia or at least that it expanded at a late moment even for the West.

A lot to think about, indeed.

Note: this kind of extrapolation based on SNP counts can only be done, if anything, with mitochondrial DNA. Chromosomes, even the rather small Y chromosome, are too large and complex and have not been properly sequenced so far, so doing the same is not really viable.


Tim said...

Not sure if this is relevant, but this study in genetic analysis by proxy claims to trace Pacific population expansion by the bacteria in people's guts...

Maju said...

Well, it's relevant... in a very ample sense only. Those studies actually seem to discuss evidence of Austronesian expansion, which belongs to a much later period (late Neolithic to Iron Age) by all accounts.

In these series of posts I'm trying to explore instead what happened maybe 70,000 years before that, when H. sapiens had just arrived to Eurasia, the largest landmass on Earth, and began its impressive expansion.

Only two female lineages from that time have survived (either because there were only two actual foremothers or, more likely, because the descent of the rest did not manage to leave an mtDNA lineage - but could have left other genetic influence). In this post I explore the basal expansion of one of those two clades: M.

In the post I will write in a minute, I will extend this exploration to the other major maternal lineage: N (and its most important descendant: R).