New blogs

Leherensuge was replaced in October 2010 by two new blogs: For what they were... we are and For what we are... they will be. Check them out.

Thursday, August 26, 2010

R1b1b2a1 is almost unique of West Europe

[Typo: in the maps M529, also known as L21, is wrongly written as M259. My apologies]

This is one of the virtues of Myres' paper (that I mentioned yesterday): that a somewhat more clear phylogenetic subdivision is made, emphasizing the difference between West European R1b1b2a and other R1b or R1b1b2, often blurred in previous papers, causing great confusion even to researchers themselves.

A defect is that instead of using a standard name for the defining SNP (L51/S167 per ISOGG) they chose to name it M412. However Argiedude says it is the same marker and I imagine it is. [Update: confirmed: the rs number indicates it is the same SNP].

Another virtue is that some of the substructure of R1b1b2a1 is also mapped, what really covers pretty well the Northern and Italian area of spread of this lineage and even some relatively unmapped areas of SW Europe, specially France.

In any case the apparent structure is curious, so I got the supplementary table S4 (supp. material is freely accessible) and made this map:

Click to enlarge

Notice that, following Argiedude, M412 stands for L51 and M529 stands for L21, what, if confirmed would make the following equivalences:
  • R1b1b2a-M412 = R1b1b2a1 (L51)
  • R1b1b2a1a2-M529 = R1b1b2a1a2f (L21)
Even if not confirmed, the equivalence should be approximate anyhow.

I'm sorry for the horrible color palette but it's my first attempt to make pie charts with Open Office spreadsheet gadget. Next time I'll do better I hope.

Notice also that I did not use all the samples, in the cases of small countries or less relevant regions I arbitrarily chose and discarded some.

Finally notice that pie charts represent only apportions of R1b1b2 and say nothing of the frequency of the lineage overall, which in most of East Europe and West Asia (excepting Turkey and a few neighbors) is extremely low.

The apparent structure of R1b1b2a1

The most apparent structure is, as we already knew the rather different R1b1b2a1a1 and R1b1b2a1a2 distribution. The first one (color coded brown and light blue) is dominant in the North and rather rare in the South - hence: 'North clade' for short hereafter. The second one (dark green, light green, purple and light orange) is by comparison not just more frequent in the South but also probably more diverse as well - hence: 'South clade'. However it is also found in the North.

Then there are some transitional "remnants": R1b1b2a1a* (L11) and R1b1b2a1* (M412). These should be informative (meaning some extra diversity at their structural levels) in order to infer the history of the haplogroup.

Per the hierarchical distribution seen here and diversity data from older works, the most likely origin of R1b1b2 as a whole is Anatolia.

Then R1b1b2a1* (M412) (yellow) is suggestive of a Mid-Danubian (or Italian or Iberian) coalescence.

R1b1b2a1a* (L11) (middle green) is suggestive of a West-Central European (or SW European) coalescence. More data on the Pyrenean region would clarify this maybe.

And, after this layer, comes the division into the widespread Southern and Northern clades mentioned before.

A reasonable interpretation is that the lineage traveled relatively fast upstream of the Danube (and/or via North Italy onto the SW), branching out then into the two major North/South clades. These two lower level lineages are in fact the two main stars of this demographic expansion.

My bet is that this represents a wave of colonization of Europe (when?) with secondary expansions from SW Europe (Franco-Cantabrian region possibly) and Central Europe (Rhine-Danube region I presume). There are several scenarios that can account for this, essentially Paleolithic (or pushing into the Epipaleolithic).

I don't see clearly how this structure could account for a Neolithic spread, really: no Mediterranean Neolithic pattern is apparent at all and Danubian limited expansion cannot account for any spread to SW Europe, certainly not South of the Loire and certainly not at the frequencies it is found there (nor in Britain/Ireland either). Claiming a Neolithic spread of R1b1b2 with this structure can only be done from a very shallow understanding of Neolithic archaeology and prehistory overall.

The main known demographic expansion we know of in the European Upper Paleolithic is the one after the Last Glacial Maximum, when Magdalenian culture expanded from the Franco-Cantabrian region, both northwards to Central Europe and, later, southwards into Iberia. This led to cultural divergence into the Epipaleolithic, with the important expansion into the newly available areas of the Far North, earlier covered in ice. Within the Epipaleolithic some further cultural flows are detected: from the Franco-Cantabrian region into Iberia (Azilian) and from somewhere in Mid-West Europe into the Southwest (Sauveterre-Tardenoisian).

Also, back to the LGM, Magdalenian techno-culture may have got a NW European ultimate inspirational origin but anyhow mediated by the warmer and richer Franco-Cantabrian region, where the culture flourished properly.

It's difficult to reconstruct in detail but, as far as I can tell, the two main North/South clades must have expanded in the Magdalenian period (one from the Franco-Cantabrian region, the other from Central-NW Europe itself) and also in the ulterior Epipaleolithic. Neolithic does not seem able to account for much but may have helped to shake the board a bit, specially in East-to-West direction.


Frequency maps

Selected frequency maps from the paper
Click to enlarge

Notice that the expansion of the South clade to the Atlantic islands does not invalidate its southern character and probably represents an Epipaleolithic-to-Neolithic spread.

Notice also the large amount of unclassified Southern clade in Iberia. The area around the Pyrenees was not really sampled in this study and therefore it is distorted by neighbors ("South France", looking more like SE France, Valencia and Cantabria).

In order to appreciate better the real thing in this aspect it's probably good to take a look at Cruciani 2010, who did bother to sample near the Pyrenees and gets maybe better (or at least complementary) maps illustrating the same problem.


Update: I superimposed (with complementary colors) the South (red) and North (blue) clades from the frequency maps above. However in order to account for the differences of frequency, I had to lighten the blue shade (North clade) because the scales are different. Take it as an "artist's impression" anyhow:



Update (Aug 27):

Here there is a hopefully better version of the map at the beginning of this post:

Click to enlarge

I put special care in giving each distinct clade an specific color range for easier visualization. All R1b1b2a1 (M412/L51/S167) seems to have coalesced in the Central-to-Western European area but the real expansion seems to have happened after this haplogroup split in two, which I dubbed the North and South clades.

And this is my reconstruction of the haplogroup expansion:

Click to enlarge
Color coded as above


Update (Aug 28):

Take a peek at the comment section, where I briefly discuss molecular clock difficulties and also the only possible Neolithic scenario for R1b1b2a1a2 (South clade): a massive demographic expansion in the context of Megalithism.

Rejecting or confirming this would require greater research in the structure hidden "under the asterisk" in SW Europe. At the moment only two minimally-sized sub-haplogroups are known: Basque/Gascon-specific R1b1b2a1a2b and sub-Pyrenean R1b1b2a1a2c (Gascon, Catalan, etc.). This alone gives highest structural diversity to the Pyrenean region, however most of the South clade remains unresolved (hidden under the asterisk), both in the Pyrenean area as in Iberia proper. And the key issue to solve would be if R1b1b2a1a2 is most diverse at the Pyrenees, what favors a Paleolithic spread scenario, or in West Iberia (and Brittany/West France), what would favor a Neolithic-Megalithic spread scenario instead.

Also it's maybe important to remind here the excellent STR work of Laura Morelli earlier this year, which was discussed in this article.

Importantly, this graph (annotated by me):


The graph is suggestive of the existence of another "West Asian" distinct haplogroup "under the asterisk" (that I labeled "R1b1b2a2?") and a possible Balcanic, rather than Anatolian origin for the R1b1b2 clade.

If so, this would correlate with the high diversity of the (much smaller) brother haplogroup R1b1a in the Italy-West Asia arch (as well as in Central Africa) and would suggest a slightly different origin and scatter for R1b as a whole (ref 1, ref 2).

32 comments:

terryt said...

Nice work Maju. And nothing wrong with the colour of the pie charts.

The reason why I keep on about the Austronesians is that I believe we can learn a great deal about human migration in general from studying that migration, for several reasons. First it was into relatively discrete islands, so it's easier to see more detail. We can extrapolate what we discover into more contiguous regions. Secondly, in many cases it was into previously uninhabited islands. Again we can extrapolate to other expansions into previously unoccupied regions.

Maju said...

"And nothing wrong with the colour of the pie charts".

Dark colors are hard to differentiate, bright colors hurt, and I would have liked to give similar colors to subclades of major clades so you can easily see the whole set.

But well...

"The reason why I keep on about the Austronesians"...

Don't you dare hijacking this thread with your Austronesian obsession too, thanks.

aargiedude said...

I'm reading up very interested everything you're saying about these studies. My compliments also on the maps.

YeomanDroid said...

Very nice job on your maps. Is it possible to get the estimated age range of each group labeled?

Maju said...

"Is it possible to get the estimated age range of each group labeled?"

It is possible to get a hunch with all the spicing of academic formulas... but still a hunch. This is called the Molecular Clock (MC) hypothesis, sometimes also "TMRCA" (that stands for time to most recent common ancestor).

Different authors have different opinions on how fast the MC ticks or, more exactly, on how should this ticking be corrected, if at all, to account for the suppression of novel (and hence minority) mutations by mere drift. In general the so-called "evolutionary rate" of Zhivotovski and others is favored but some would like the so-called "pedigree rate" (all extant mutations have expanded wildly, while the rest died off), which makes no sense to me except for father-son estimates and little more.

I have reasons to think it can be even slower than the 'evolutionary rate', because that is what research on animal molecular clock seems to be saying, specially on primates (ref 1, ref 2). The main reason being that while the usual date for Pan-Homo divergence in MC works is take between as little as 4.5 million years and some 7 million years, the real divergence should be rather of 8 to 10 million years.

Something as basic as this can alter everything, as it's the fundamental reference.

So MC estimates are, at least by the moment, nothing more than an erudite guess.

My approach is different: I try to reconstruct the pattern of spread from the structure and then see what prehistoric scenarios fit, if any. This is easier to do in Europe, where the archaeological record is quite dense.

In this case, it'd seem like the easiest scenario is for a post-LGM (Magdalenian+Epipaleolithic) spread for R1b1b2a1 major subclades.

I have been considering a hypothetical Neolithic scenario, of course, and the only possible one would imply massive demographic flow associated to Megalithism (South Clade only, the North clade may be explained somewhat in the LBK context). A key to confirm or dispel this alternative scenario, would be to research the SW European structure of R1b1b2a1a2*. So far, the SW component of this clade seems most diverse and is likely to be even more diverse "under the asterisk" but, depending where greatest diversity is (at the Pyrenees or in West Iberia/Brittany), it would suggest a Paleolithic or a Megalithic scenario.

Argiedude's STR diversity maps do not support either scenario clearly, so we really need to dig into the substructure of R1b1b2a1a2 beyond the asterisk in order to clarify this matter.

Maju said...

I added another small update with some considerations re. possible scenarios and including a reference to Morelli-2010, an important R1b work that did not get much press/blog coverage but that seems important to me.

aargiedude said...

Between the Myres and Cruciani studies, North Italy has 45 U152 samples out of 78 R1b1b2 (58%). From other studies, mainly yhrd, I estimated North Italy's R1b1b2 at exactly 50%. This results in North Italy having 29% U152 as a percentage of their entire y-dna.

Switzerland has 18% U152 (n=175), southeast France has 17% U152 (n=367), and central Italy has 18% U152 (n=262). Slovenia, has 5% U152 (n=205).

From central Italy's 18% it then descends to 10% in south Italy, and it remains at 10% in Sicily. It's hard to calculate any other region in France besides the southeast, so as a proxy I'll use northeast Spain. From southeast France's 18%, the frequency descends to 10% in northeast Spain. And from Switzerland's 18% it then drops to 10% in west and south Germany.

U152 is clearly centered in North Italy. Picture a 4-sided pyramid with 3 sides sloping down and the eastern side imploded.

-----------------

Corsica has 32% U152, but with a very small sample size of just 28. Most of their R1b1b2 belongs to U152, which is telling, because across Italy, even in Sicily and Sardinia, between 50% and 60% of their R1b1b2 belong to U152, while instead, in France, only 25% of their R1b1b2 belongs to U152, including southeast France, where it's exactly 25%. So this is yet another component of Corsica's y-dna that points to an Italian origin for their people. Everything else about Corsica's y-dna also previously indicated an origin from Italy, preferably from the part of Italy closest to Corsica: central Italy.

Maju said...

"U152 is clearly centered in North Italy".

I agree. I also had that impression when you sent your modified spreadsheets.

Cruciani-2010 (link to full paper in main article) maps the highest frequency of R1b1b2a1a2d-U152 (that he calls R1b1b2h) around Genoa, with a much sharper cline to the East than to the West (France). R1b1b2a1a1-U106 is in turn most frequent in West Germany.

However U152 is very much scattered, also into Central Europe, France and South England, and is even important in the little R1b1b2 that may exist in Eastern Europe.

So this lineage represents, IMO, the main expansion line of its parent lineage R1b1b2a1a2 ('South clade'), which headed eastwards and northwards in whatever process (Magdalenian expansion or your choice).

Re. Corsica: if you look at Cruciani's figures, similar densities are only found in North Italy, with Central Italy having only some 18% of the whole Y-DNA pool, a value lower than the French sample (20%).

aargiedude said...

M269 L23- (version 2, better).gif

I first made a map of M269* exactly 1 year ago. Now, with the Myres study, which tested for this haplogroup for the first time ever, I've made what I think is a vastly better version than my original map. The Myres samples make up about 40% of the samples I used in building the map, but the populations they covered were different than the other samples I used (aka more eastern), so this is a revealing new look at the distribution of M269*. I don't want to spoil the plot, so I won't say anything further. :)

Maju said...

What does the map indicate, Argie? Diversity, percentage? Is the comma a decimal marker (as in Spanish, French...) or just a sequential comma? I'm sorry I don't really understand what the map means.

"I don't want to spoil the plot, so I won't say anything further. :)"

Oh, please, spoil us. :D

aargiedude said...

I was sloppy, he he. It's a map of frequencies, calculated over the entire y-dna (it's not a percentage over their R1b1b2 samples). And the commas are decimal separators: if it says 3,5 then it means 3.5%.

Maju said...

This would seem to imply that the majority of Turkish R1b1b2 is R1b1b2a(xR1b1b2a1).

It is suggestive of minor expansions at the R1b1b2* stage but they are difficult to understand without a haplotype tree (except the Ashkenazi cluster that would seem a specific founder effect).

There is one by Morelli (included in my original post) but it does not say anything about the Cyprus and Algeria eccentric hotspots. It does seem to suggest however that there is a second parallel haplogroup to R1b1b2a1 in, essentially, Anatolia. We see it as R1b1b2a* for lack of current resolution but it's probably an 'R1b1b2a2' in full right.

aargiedude said...

There is one by Morelli (included in my original post) but it does not say anything about the Cyprus and Algeria eccentric hotspots.

But note that the sample sizes are amongst the smallest for those regions, just 67 and 15. It's a statistical anomaly. Anyhow, finding 1 out of 15 in Cyprus is indicative that M269* must have a decent presence there, to have already been spotted in just 15 samples, but I'll bet it's probably less than 6,7%.

Ok, I looked at a study of Cyprus by Capelli, with 65 samples, and there were 2 candidates for R1b1b2* (393=12), which would put it around 3%, assuming none of these are in fact L23*, which to me is probably very likely.

The figure that does stand out for me is Iran and the Tatar/Bashkir region, though in the latter, M269* was found only in Bashkirs, but not in the 300+ Tatars, Udmurts, and other ethnic groups of the region. And the Bashkirs are the weirdos who had, what was it, 20% or something U152? Bizarre.

Maju said...

Good caveats, thanks.

The Bashkirs are totally anomalous in East Europe, being very high in R1b overall. They must have got some kind of founder effect but who knows how or when!

Even if they speak Turkic nowadays, that does not seem to have been their original language, with "Hungarian" being reported before the Mongol conquest. Maybe this language was also acquired somehow.

aargiedude said...

Maju, I've made these maps using the recent data from Myres and Cruciani, and I thought you'd like to see them. Note that I used the Myres/Cruciani only to estimate the make up of R1b1b2, but for the overall frequency of R1b1b2 I used aggregated data from many other studies and yhrd, which I think is much better. Combining the 2 I obtained the frequencies of the below R1b1b2 haplogroups.

L21
http://img231.imageshack.us/img231/5751/l21snptested.gif


U106
http://img255.imageshack.us/img255/5308/u106snptested.gif


U152
http://img185.imageshack.us/img185/9885/u152snptested.gif


P312(xL21,U152) [Note that this includes SRY2627]
http://img830.imageshack.us/img830/3839/p312xl21u152snptested.gif


R1b1b2 "confluence"; this map was made using the blue contour lines in the maps above
http://img188.imageshack.us/img188/1994/r1b1b2confluence.gif

Maju said...

Very nice maps, the seem to confirm the maps posted in the article, though you emphasize some parameters because of your choice of lower limit at 8% for the main blue line and lack of marking greater levels, which are sometimes very distant from a mere 8% (even as much as 49% in the R1b1b2a1a2*.

In any case, it seems more and more obvious to me that R1b1b2a1a2 (P312) expanded from the Franco-Cantabrian region. It's probable that Iberian P312* can be reduced by the discovery of one or two characteristic haplogroups. This happens in the sub-Pyrenean populations, where two subhaplogroups (b and c)are very important. However further SW nothing has been clarified as of yet.

Maju said...

Btw, what Cruciani's paper is that one?

aargiedude said...

The isoclines are logarithmic, that's why the blue lines span a disproportionate amount of the percentage range.

The Cruciani study is

http://dx.doi.org/10.1016/j.fsigen.2010.07.006

Strong intra- and inter-continental differentiation revealed by Y chromosome SNPs M269, U106 and U152

I have the pdf:

http://www.sendspace.com/file/z61ki9

Strangely, the pdf says Cruciani is the main author, but the dx.doi link above says Trombetta is the main author (he's 2nd author in the pdf).

Maju said...

LOL, I had already read and even linked to that Cruciani paper (zoho viewer). I totally forgot about it though. Thanks in any case. :)

"Strangely, the pdf says Cruciani is the main author, but the dx.doi link above says Trombetta is the main author (he's 2nd author in the pdf)".

Seems a typo: U152 has overwritten Cruciani's name but the "a" note and the comma following it remain.

"The isoclines are logarithmic, that's why the blue lines span a disproportionate amount of the percentage range".

I'd say it's not a good idea. I'd rather draw them as 5%, 10%, 20% and 30% for instance. That would be more informative.

Clay said...

Thank you for all of your hard work on this particular subject, Maju. I am S116* myself and I always look for new data on this or on Southern Iberia in general. My paternal family tree comes to the USA through Wales, but obviously it was in Iberia before that. According to my searches on the YHRD site, it was in Valencia at some time.

By the way, I always enjoyed having your two blog topics together in one blog. It seemed like your particular contribution. Now I look at the anthropology blog every day but the political blog not as much.

Keep up the good work.

Maju said...

I had to make a choice on that blog issue. I was not really happy on how the "political" posts (typically more common, as there's something going on every day) "ate" the anthropological aspect. However I had to struggle with myself about that.

If you become a "follower" (you can do it anonymously) you'll get instant feedback in your blogger dashboard, so you can read headlines and first paragraphs without visiting anywhere as such. You can also follow via Google reader (very practical for all kind of stuff, you can read full articles that way without visiting). That way you can keep track of both blogs' contents without much bother. It's like reading the news with breakfast, so to say...

"My paternal family tree comes to the USA through Wales, but obviously it was in Iberia before that. According to my searches on the YHRD site, it was in Valencia at some time".

I guess so then. But you do have a very "British archaic" face, what does not need to have any relation with your paternal lineage. In fact the first thing I though was that you look a lot like my apartment mate, who is from London (but your face is more narrow/long and of course other details). But the similitude is quite striking in any case, specially the deep set blue eyes that give that peculiar inquisitive expression.

Cheers.

Clay said...

I certainly look British. All my recent ancestors came from Britain except my great grandmother, who immigrated from Finland. So I am seven-eighths British and one-eighth Finn. If my paternal line passed through Iberia, it was a very long time ago. Still, it only really matches there. My primary interest is the historical migration of peoples that is studied by y-chrom DNA.

Have you ever seen a digital composit of faces from Wales or some other representation of an ancient Briton? Dienekes does a lot of that stuff. I have never seen him create a Briton.

Clay said...

This is a better photo. Still look archaic British? The friend in the photo is one quarter Sicilian and three quarter Irish.

Maju said...

Have you checked Anthroeurope? It's a blog made by a young amateur anthropologist from Gascony (in English). He "collects" faces (from Facebook?) and makes series and composites. I can find three from Wales: Caernafon, Anglesey and Pembrokeshire (which should be the true essence of Welshness, I imagine).

Enjoy.

Bora Kizilirmak said...

I am R1b1b2a* (what is this star at the end I dont know however 23and me represents my hoplagroiups with that star), Where I am in your map and can you tell me what are those L and M switches sucha s they say r1b1b2 Lnn or Mnn or Ht35? I am sorry I am new into this, but what is the other representation of my hoplagroup with those L and M numbers?

Maju said...

The asterisk means that it is the "others" category; that your individual lineage is within R1b1b2a but not maybe within downstream haplogroups R1b1b2a1, R1b1b2a2, etc. What exactly it excludes must be interpreted from context.

A more precise nomenclature could be something like R1b1b2a(xR1b1b2a1,R1b1b2a2), for example, where the eXcluded subhaplogroups are listed specifically after the "x" character, but for practical reasons this style is often replaced by forms like R1b1b2a*, where the reader must interpret from context what is excluded from the category.

what is the other representation of my hoplagroup with those L and M numbers?

Those are mutations, single nucleotide polimorfisms (SNPs) in most cases. While haplogroup names change following a want-to-be-logical method of alphanumerical sorting of our knowledge, mutation names are fixated since discovery (the letter represents the lab, the number is sequential) although some times two or more names represent the same mutation (as described by several labs).

Because haplogroup nomenclature changes over time, it is common practice to list one of the defining mutations along the name. Sometimes even using the mutation along the first part of the name because long names are impractical anyhow. For example R1b-S116 does not mean that R1b is defined by the mutation S116 (it is not) but that I decided to shorten R1b1a2a1a2 (S116).

For more detailed information on each lineage please visit ISOGG: http://www.isogg.org/tree/

Unknown said...

We just did 23 and me and my husband got R1b1b2a1 for his paternal lineage (Y chromosome). His Y chromosome should come from Galicia, Spain so it fits the Western Europe location.

Wondering why they don't take it further to the subgroups - do you know if it is lack of resolution in the data or do they usually dwelve deeper? The do provide SNPs but I haven't analyzed that data yet, we only got the first results yesterday.

Maju said...

I don't know why or rather I think it has to do partly with politics: some states like France are openly hostile to population genetic research and make it very difficult, resulting in France being a question mark in the middle of European genetic knowledge (and it's the largest state of Europe after Russia!), Spain is not really. But as genetics could show that French are not an amorphous bunch, but rather a Frankenstein made of pieces (or whatever, we really know very little), they make research near impossible.

Spain is not that bad but there it's economic resources and popular interest what are lacking, so researchers have to do with relatively old means. They are doing a good job in many cases in spite of that but there is not the sociological interest nor the popular expenditure in private genetic tests as yours, that create a favorable climate for research elsewhere. So research is largely in hands of US or German and Swedish universities and that causes a bias. We know more about very rare Swedish R1b subclades than about the bulk of Iberian R1b, let alone the French!

Maju said...

Basically most people in Latin Europe do not care about their ancestry. They are Romans what means a "mixed poutpurri" and they like it that way. So population genetics? What for? That's "racist"! They say or think...

Unless that changes, do not expect big advances. Most of the research done in Iberia itself is foreign (although there are some exceptions in the Basque Country and Catalonia). There was even that horrible study on Iberian autosomal DNA which totally excluded not just Basques but every province or region around them, on obvious political fear of showing that Basques are actually different (which is anyhow shown in other foreign studies once and again). So basically they sampled Castilians, Asturians and Valencians (these three ammalgamated amorphously) and also Catalans and Andalusians (somewhat different each).

Charles said...

Thanks for the info Maju. I tested both my maternal grandpa's Rosario line and my great maternal grandpa's Figueroa line and they both came out as R1b1b2a1a. It's clear to me that my maternal family, although living way up in the Hills of Puerto Rico for centuries, very much had their roots in the Canaries and Iberia. :-)

Unknown said...

I had my dad's DNA done through 23andme. His lineage comes from Genoa and his haplogroup came back as R1b1b2a1a2d. All of his grandparents immigrated to the US in the late 1800's. It seems the ancestry portion isn't that great yet, as his Italian heritage appears to be underreported on their percentages (father was Italian, mother was Finnish)but from my understanding of this, it would appear what the paper trail says is accurate in regards to his haplogroup. Very interesting! Thanks.

Unknown said...

I would have to agree I have the R1b L11, i come back anitolian on oracle 4