New blogs

Leherensuge was replaced in October 2010 by two new blogs: For what they were... we are and For what we are... they will be. Check them out.

Monday, October 12, 2009

R1b1b2 tree revisited


A few days ago a reader made me notice a meaningful error in
my previous version of the West Eurasian R1b structure. The tree was correct but I had an error in the DYS loci numbers that caused me some confusion when comparing with other data, like that at the valuable Ht35 project (FTDNA).

So I have been rethinking all the issue from scratch and have just finished producing a new version of the same tree, where haplogroup and haplotype structure are blended as far as my knowledge reaches.

Click to expand

As is mentioned in the picture, the haplotypes are taken from Alonso 2005, paper that may be somewhat limited in some aspects, like the restrictive choice of DYS markers but is anyhow the wider academic survey I know of R1b in West Eurasia. More specifically they are taken from his selection of the most common (above 2%) haplotypes, as shown in the distribution map (two were duplicated and hence irrelevant, and another one only frequent in Iceland has also been ignored for convenience). As R1b1b1 is only found in Central Asia, R1b* is extremely rare outside Africa (Egypt and south of the Sahara, where it surely makes up a distinct subhaplogroup, whose defining SNPs have not been found yet) and the minor haplogroups R1b1a and R1b1c are practically limited to Sardinia and Lebanon, not sampled by Alonso, this is in practical terms the same as talking of haplogroup R1b1b2.

In any case, I have been checking that the haplotypes correspond with haplogroup-described ones and I am quite sure that the above haplogroup sets (yellow boxes with orange legend) are very much correct. All haplotypes seem to correspond to only one of the R1b1b2 structural layers, excepting the modal (most common) 14-24-11-13-13, which in fact belongs to two of them: R1b1b2a1* and R1b1b2a1a. This and the haplotype tree structure imply that all SNPs at the root of R1b1b2a1a (only L11 is mentioned but there are three others known) happened within this haplotype, though there may be some differences in other loci, which I have not bothered researching yet. What is clear is that there are some mutations (haplogroup-defining SNPs at least) in this most common (modal) haplotype and hence I had to duplicate this haplotype to show it with and without the L11 (and the other three) SNP mutations that define haplogroup R1b1b2a1a (I did so anyhow with a colored line, so the difference is easy to spot).

The four most common haplotypes by local frequencies are shown in bold and larger type and with the same color as in Alonso's map, for easier identification. It is noticeable that all haplotypes have an either West Asian or European distribution (once we exclude the somewhat mixed Croatian pool and maybe minor erratics) excepting two: the "modal" 14-24-11-13-13 (in red) and what I call the "Anatolian modal" 14-24-11-13-12 (in cyan blue). However the latter is rare in Europe except maybe in Croatia and Italy and the former is much more common almost anywhere in Europe than in Asia. The dividing line between West Asia and Europe (mostly West Europe) is therefore around the R1b1b2a1 node.

Geographic structure

R1b1b2* - Its only major haplotype is found only in West Asia (rare) and among Berbers (more frequent).

R1b1b2a* - It is found almost only in West Asia, with some offshots into SE Europe and even at very low frequencies, Central Europe, Iberia and Ireland. Excepting Croatia all its presence in Europe seems to belong to the "Anatolian modal" clade. It is not found among Berbers though.

R1b1b2a1* - It is found almost exclusively in Europe. The only exception are some amounts of the modal haplotype, that is also found in West Asia and among Berbers, though I am not sure if this belongs to this haplogroup or the derived R1b1b2a1a. Apart of the modal haplotype (that is also the root), it has two branches, both rather widespread in their root haplotypes, however they seem more frequent towards Central/Northern Europe. The most common of the two (root at 14-23-11-13-13, color: purple) has highest frequencies in the Low Countries of all places, while the less important one seems to have its highest frequencies in Wales, with some Atlantic distribution if anything.

R1b1b2a1a - This is the most common R1b in Europe, being quite obviously rare elsewhere, except among Berbers. As such it shows a relatively wide array of more relevant (above 2% in Alonso's sample) haplotypes. Surely the most important haplotype is the "modal" one (I'd say from other data that about 80% or 90% of this haplotype belongs to this haplogroup), which is widely distributed. Besides, it has three branches, one of which is quite important too (root at 14-24-10-13-13, color: black) and is widely distributed as well, with a highest frequency in Scotland (though also common in other populations, including Berbers but not West Asians). The other two are less important and show certain Atlantic tendency maybe, with lower tier haplotypes being rather common among Berbers.

And this is what I can tell about this important haplogroup by the moment.
.Link

25 comments:

Anonymous said...

Well, mine is 13 24 14 10 10 P-312

Is is common to have 13 and not 14 at the beginning?

Maju said...

If your sequence is correct for the said DYS loci (please double-check), what is really rare is to have DYS393=10 and that particular combo in general, not DYS19=13, which does exist.

In the supplementary material of the paper there is a tabbed document that can be easily converted into a spreadsheet, which includes many haplotypes (I only used the most common ones), most of them spotted only in few individuals. However I can't find anything even close to your sequence, so I suspect it must be an error (maybe you scrambled the loci?). I see 13-24-... but the other three markers are nothing like yours in any case.

However, if you have been tested for P312, then you are a clear case of R1b1b2a1a2 (cf. ISOGG).

But if your sequence is correct for the DYS: 19-390-391-392-393, then it is a most rare haplotype that does not even have a close relative in Alonso's sample of R1bs. The values of DYS393 are all between 12 and 14, and most commonly 13 for R1b1b2a1 (12 for the "Asian" upstream variant but no 10), those of DYS392 are somewhat wider: 9-15, but again for DYS391 yours seem not included: 9-13.

The "closest" I see (on a quick search) is 13-24-12-13-13, a rare haplotype spotted in England and Armenia (1 individual each). You would still be at least three multi-step mutations away (and multi-step mutations are said not exist, though I also know one who contests that).

However something closer may exist in some database. I'd suggest you to join some specialized forum like the one at Family Tree DNA, where they might be able to shed more light on the issue. But first make sure you have not scrambled the loci.

Anonymous said...

I'm R1b1b2a1b. FTDNA did my test. I actually tested 67 markers.

My FTDNA sequence is:

393=13; 390=24; 19=14; 385a=10; 385b=15; 426=12; 388=12; 439=12 etc. That's what I meant.

Paul Mize said...

Have read your posts on more than one website. You seem to know your stuff.
My FTDNA sequence is:

13 25 14 10 11 14 12 12 12 13 13 29

18 9 9 11 11 25 14 19 28 15 15 16 18

11 11 19 22 16 15 20 17 34 35 12 12

That's 37 markers can you tell me my most precise subclade of R1b2?
Thank you,

Maju said...

I'm R1b1b2a1b.

That's old nomenclature. L51 got inserted last year and hence all clades downstream, gained yet another letter/number as needed. Most FTDNA lists still show the old nomenclature though, what may be confusing.

Check ISOGG, I insist, which is the quasi-official reference worldwide for these nomenclature matters and is universally acknowledged.

393=13; 390=24; 19=14; 385a=10; 385b=15; 426=12; 388=12; 439=12 etc. That's what I meant.

Look at the DYS numbers (the ones at the left). Alonso in his paper and I in this post and graph, have used only the following DYS markers in this order: 19-390-391-392-393.

Hence your sequence for comparison with this post is: 14-24-?-?-13. Check for DYS 391 and 392 for further detail but so far looks a completely normal sequence within R1b1b2a1; DYS 393=13, or higher, is almost always defining this haplogroup, while DYS=12 is rather defining the R1b1b2(xR1b1b2a1) upstream para-haplogroup, most typical of West Asia (ht35).

Maju said...

Have read your posts on more than one website.

Really? I feel honored.

You seem to know your stuff.

Let's say just that I have got some idea, nothing else.

My FTDNA sequence is:

13 25 14 10 11 14 12 12 12 13 13 29

18 9 9 11 11 25 14 19 28 15 15 16 18

11 11 19 22 16 15 20 17 34 35 12 12

That's 37 markers can you tell me my most precise subclade of R1b2?
.

Nope, because I have not researched haplotypes in such depth, with so many markers. You probably get better answers at FTDNA forums or something of the like.

However, and assuming that you used the normal FTDNA sequence order, for comparison with this post, and Alonso's paper, the sequence would be "translated" (and highly simplified) into 14-25-10-13-13, I think.

Which on first sight could be a single step variant of two of the common haplotypes I mention: 14-25-10-13-12 and 14-25-11-13-13. The first one is Anatolian and Croatian essentially (R1b1b2a*), while the second is Western European (R1b1b2a1*).

Nevertheless, as DYS393 is a crucial marker almost always describing the difference between R1b1b2(xR1b1b2a1), i.e. the "West Asian" clade, that has DYS393=12, and R1b1b2a1, the "European" clade, which has DYS393=13 or 14 in almost all cases, I'd say that with all likelihood your Y-DNA belongs to the European clade, being a single-step variant of the second aforementioned haplotype (14-25-11-13-13), which is, if I'm correct, part of R1b1b2a1(xR1b1b2a1a).

In other words: it should be L51+, L11-. However only direct SNP testing can confirm this.

I'll check now Alonso's long haplotype list to see if and where your clade has been sampled.

Maju said...

Tantalus:

14-25-10-13-13 is relatively common and does seem essentially European, as I thought (and hence should be under L51).

Alonso spotted it in 10 Scots, 21, English, 2 Welsh, 4 Irish, 1 Croatian, 1 Norwegian, 1 Austrian, 2 Germans, 5 Danes, 1 Icelander, 9 Italians, 4 Basques and 13 Iberians. Notice that these absolute figures are not apportions, and each one should be compared with the regional sample (n), which are pretty large for Brits and Iberians.

In any case it is quite clear that it's not present, as far as this study goes, outside of Europe and that is part of R1b1b2a1-L51.

Maju said...

Oops, in the related haplotype "14-25-11-13-13" I really meant to bold DYS391=11, not DYS393=13, resulting in: 14-25-11-13-13, meaning with the bold type that that is the one locus mutated in your sequence. My bad.

Paul Mize said...

Thank you for your thoughts and input.

Unknown said...

Maju:

Thank you for the update. I believe that your re-analysis is accurate, based on my own research. My grandfather's haplotype is R1b1b2a1 (14-23-10-13-13). However, my own autosomal report shows no European influence whatsoever, but does show Berber from Morocco. This was the reason I felt your initial analysis of this as a "Scottish" haplotype was oversimplified.

Maju said...

Thanks, Xavier.

As I said before, I did not mean that it was originally Scottish, just that the haplotype appeared more frequently among them. It was a mere convenience tag.

All these common haplos are geographically mapped in the original paper and listed in a supplementary material item. You should check them for better detail.

Anyhow, this time I have avoided these convenience tags for that reason.

Unknown said...

In the process of reading the paper now, as a matter of fact.

Unknown said...

zingaro, B8983
Please look at my R1b1b2a1a2 DYS
Markings. I think I am quite different than what you have evaluated.
imegyptianpharaoh@yahoo.com

Maju said...

Feel free to describe them here or send me an email (the address is at the left margin of the blog), Pharaoh.

kingle said...

Hope you are all doing well. Can anyone help solve the puzzle of this rare sequence, which was tested at National Geographic as an R1b1b2 and came out with the following results:

DYS393=14, DYS19=14, DYS391=11, DYS439=13, DYS389 I=13, DYS389 II=17, , DYS388=12, DYS390=23, DYS426=13, DYS385a=11, DYS385b=14 DYS392=13.

The National Geographic specific results were R1b, M343, Subclade R1b1b2, M269

and: M168>P143>M89>L15>M9>M45>M207>M173>M343

This is rare at the 393=14 and 426=13, although it has many matches with one deviation with many northern european men at Ysearch. The predictor also gives it a 25% of being Frisian, but I have not seen any exact matches, has anyone come across a match. Can anyone take a stab at its origins? Much appreciated.

Maju said...

I'm not familiar with DYS426. If there is just one mutation difference you can always consider it was surely introduced to your lineage recently, maybe in yourself, your father, your grandfather... That is, I understand, common enough (rare but very real).

The main sequence (following Alonso's short series of DYS 19-390-391-392-393) is a total match with an R1b1b2a1 important clade in the the tree above. It does seem to me a likely Northern European variant.

If you're planning further testing, I'd test for SNPs L11 and U106, though most likely (I think) it'd be negative and hence your haplogroup should be R1b1b2a1*.

Haakon said...

Hi, could you please tell me where

R1b1b2a1a2d3

12 24 15 11 11 14 12 12 11 12 13 29 comes from? The closest I can find in your diagram is Belgium?

Best Regards

Maju said...

No, sorry Haakon. I'm assuming it's part of R1b1a2a1a2d-L238/S182 as per ISOGG (FTDNA nomenclature is non-standard and often confusing) and therefore a newly proposed subclade of R1b1a2a1a2-S116, which is the most common R1b sublineage, what I call the South or SW European clade, most common in Iberia, France and British Isles, see: http://leherensuge.blogspot.com/2010/08/r1b1b2a1-is-almost-unique-of-west.html.

I'm not really working with STR markers anymore: they were of some use before SNP substructure was discerned but nowadays not anymore (at least not for me, being interested in population genetics not personal ancestry). In any case, when you present STR markers out of context, as is this case, you should numerate them, otherwise they are just meaningless.

Haakon said...

I´m sorry, as follows:

DYS393 - 12
DYS390 - 24
DYS19** -15
DYS391 -11
DYS385 -11-14
DYS426 -12
DYS388 -12
DYS439 -11
DYS389I -13
DYS392 -13
DYS389II*** -29

Maju said...

Well, thanks but I can't help much. You will have noticed that the sequence is not even listed as such in the schemes, surely because it's not too frequent. Also, as I said before I was back in the day (2007-2009 I guess) trying to reconstruct a meaningful substructure of R1b but nowadays that has been done much better by means of SNPs.

Do you even know what SNPs define your haplogroup? That may be more informative (or alternatively the standard ISOGG name, see: http://www.isogg.org/tree/), however if it is a rare and newly proposed haplogroup, as I suspect, it's likely that it will have few known members and therefore we'd have to look to the upstream affiliation, for example R1b-S116, which as I said is widespread and may have expanded from the Franco-Cantabrian Region in post-glacial times (or who knows?, it's pretty messy).

Haakon said...

Thanks - I have six or seven 12/12 matches in all databases and they are essentialy the same persons, almost all english/irish. The page at Lehrensuge you suggested doesn´t exist anymore. What type of test do you recommend for me to get more knowledge?

Best regards

Maju said...

The page is available to me, maybe you selected something extra in the link like the final dot (intended just for punctuation, sry). Try this way:

LINK

"What type of test do you recommend for me to get more knowledge?"

Not sure. I'm not really familiar with the commercial DNA testing nuances (I really do not care much about personal genetics and I feel that companies promise more than they can actually deliver) but what you want to know first of all is the SNP (single nucleotide polymorphisms) that describe your haplogroup, because nomenclature changes and what you described is not even listed at ISOGG (with that name), so knowing something about SNP markers is important and I'd say that fundamental (these SNP markers retain the name all the time, although the only logic their follow is that of the discoverers, so they are less useful for structural or phylogenetic discernment and that's why the other nomenclature does exist as well).

Haakon said...

Thanks for your reply - as far I understand the haplogroup is now called (2008 )R1b1a2a1a1b3c L2, U 152..confusing.

Maju said...

Yes it is very confusing, even for those with a deep interest sometimes. Check the most updated (2013) version: http://www.isogg.org/tree/ISOGG_HapgrpR.html - that's the standard reference ("official", so to say).

From what you say your haplogroup is:
R1b1a2a1a2b1 (L2/S139)

The first is just a name which follows a pattern and is revised every several months, as new discoveries arrive. Names are getting crazily long, I know.

The part between parenthesis are two synonimous names for the same mutation, known as "single nucleotide polymorphism" (SNP) which means that one "letter" (nucleotide) has changed (C→T or whatever) in the lineage. These don't change names but often have more than one name (two or more different researchers claimed its discovery or whatever).

A different thing are the short-tandem repeats (STRs), which is what we were talking about first. These are not too reliable markers that mostly serve to get a peek of what kind of affiliation there is in absence of haplogroup (SNP-defined) knowledge, for example whithin a known haplogroup.

OK, so I do not know a shit about R1b-L2 (new haplogroup) but R1b-U152, its ancestor (larger haplogroup that also includes your lineage) is well known: it is the so-called "Alpine" or "Celtic" clade, which is widespread but most common in Switzerland and Italy.

It is descendant of the common European R1b-S116, the SW or South clade, centered and most diverse in Southern France, reason why I think it spread with Magdalenian culture (regardless of later re-expansions or whatever).

Check the article I mentioned before because at least you will get an idea of where is R1b-U106 found. However I do not know anything as of now about its substructure, including your particular L2 sublineage.

Hope this helps.

Haakon said...

Thanks alot for your reply,Maju.