Saturday, February 9, 2008

Biased European genetics (again)

I am pretty sure that it's not really intentional but each time they study autosomal European genetic variation, they forget to study the central strip (French, Austrians, Hungarians, etc.). They seem to like going to the extremes and somehow "demonstrate" a fallacious discontinuity between Northern and Southern Europe.

This is the case of the new research by a US team led by Chao Tian. The overall results are most visible in figure 2C (excluding Askhenazis) that show both a distribution along the East-West geographical axis and along the North-South axis (maybe more apparent for lack of intermediate populations). Smaller components (figure 6) also seem to emphasize the E-W dominant cline, though they are dismissed by the authors.

This last is surely wrong. Not much older studies of the same kind evidenced (Bauchet et al, 2007) that when taken only two components the results are actually much distorted. Often smaller components in the overall, are very important and even dominant in one specific population. These locally dominant components become invisible when only the two or three more extended are considered, making large geographically defined populations to be classified by a minor component of their genetic pool.

European five main clusters

The above image is a five-pole diagram I drew some time ago based in the K=5 graph below from that other study (the study reached to K=6 but the 6th component was too diffuse to matter, maybe it is a Balcanic or Eastern European element, as these areas were not studied).

Considered only two components (K=2, not shown but corresponding to the red and blue ones), Spanish samples, for instance, fell almost completely in the red "Near Eastern" zone, while Basques resulted extremely ambiguous (due to near lack of either of these two components).

Instead, seen as a plot of five components, Spanish and Basques clearly cluster primarily with themselves and no one else. Some Spanish are somewhat intermediate with Eastern Mediterraneans while others are intermediate with Basques but mostly they cluster on their own.

Another find of the K=5 plot is a Central-Northern European cluster (green) distinct of the "Finnic" blue marker. Also it's noticeable that many Northern Europeans show tendencies towards not just Finns but also Basques or even Southern Europeans in some cases. Again the lack of representation of the intermediate strip (France is only represented by one sample, while the Danubian basin, the Balcans and Eastern Europe are totally absent) creates some distortion, enhancing N/S differences.

By the way, how do I read these clusters? In my opinion the Iberian (cyan), Basque (orange) and Central-North (green) clusters must represent late Paleolithic Magdalenian and/or Epipaleolithic populations: those of the Iberian, Franco-Cantabrian and Rhin-Danub regions respectively. The two principal components instead would represent two later arrivals: Neolithic for the red (Eastern Mediterranean) one and Uralic (Fino-Ugric) for the blue one - though this last one poses some difficulties of interpretation actually (it is very possible that this "Uralic" element has been distributed by Indo-European migrations as well, specially those linked to Scandinavia and the Baltic region, like Germanic peoples).

