Wednesday, May 26, 2010

New paper on Spanish genetics

There's a new paper of some interest on Spanish population genetic structure:

J. Gayan et al. Genetic Structure of the Spanish Population. BMC Genomics. Open access.

The main aim seems to be to provide a Spanish dataset for population genetic research. And so far so good.

However the sampling strategy is awkward to say the least:

In the above map (my creation on the paper's data), red dots indicate sampling locations, while blue areas are regions (autonomous communities) not included in the sample. Most noticeable is that not just the Basque Country has been excluded but also all the surrounding area.

As I say, quite awkward.

Other surely distinctive unsampled areas are the Canary Islands and Galicia.

The whole design of the sample has a Castile-centric bias that is difficult to understand.

But, well, that's what they did. And, once we know that, we can go on to look at the results:

In this graph (fig. 6 annotated by me) we can see how Catalans and Andalusians tend to diverge from neutrality in orthogonal directions. To a less clear extent, the North Castilian samples (Arévalo, Segovia) also diverges somewhat.

We can say that PC1 describes a Catalan-other axis and PC2 an Andalusian-other one. The lack of distinctiveness of some geographically eccentric samples such as the Asturian one (Avilés)may well be caused by the small size of the sample. It is very possible that a PC3/PC4 graph would evidence some distinctiveness that is not apparent here.

Remember that PC graphs are merely bidimensional representations of some of the apparent structure, with all the limitations that this implies.

European comparison

When compared with other populations of European ancestry the PC graph is as follows (annotations by me on fig. 7):

Catalans appear to have some tendency towards Italy and NW Europe, while the less defined eccentricity of Andalusians only seems to tend towards Italy. There are also a couple of Castilian individuals who cluster with NW Europeans, maybe because the North Castile area sampled was the core of Visigothic settlement (just a hunch).

Global comparison

There's not much to say about Spaniards in the global scatterplot (fig. 8), really: all Europeans just cluster very tightly, the same as East Asians (Chinese and Japanese).

What I found intriguing and worth posting this graph is the curious coincidence of the scatter of Kenyan Maasai (MKK) and US African-Americans (ASW). Notice that the LWK sample (Luhya) are also from Kenya but Bantu and they cluster best with Nigerian Yoruba (YRI). Not really sure because it'd need further research but certainly the almost identical disposition of MKK and ASW samples is suggestive of the Maasai (and maybe other Nilotes) being somewhat admixed with West Eurasians. Alternatively the distribution might be reflecting some African-specific differences (just like Indians in the Eurasian context) and its overlap with African-Americans is to some extent an artifact of the limitations of PC analysis.

I do miss a comparison with North Africans, which seems to be a taboo in Iberian and European population genetic studies. However I do detect (and not only here) a slight "African" tendency among some Iberians which may well reflect a greater affinity with North Africans, in turn slightly more akin to ultra-Saharan Africans. This would be an interesting matter to analyze.

Update: I totally forgot to mention that just a few days ago I commented on another paper by a Catalan team that did compare Iberians and North Africans, which may serve for comparison. The results appear wildly different in the PC graph, with Catalans, Basques and Cantabrians clustering on one corner and the other Iberians scattered with a clear tendency towards the Eastern Mediterranean. No apparent North African affinity was detected.


Kepler said...

A lo mejor es que los vascos no se dejaban pinchar.

On the North African absence: I also have the impression Spaniards in general hate to be reminded basically of "sangre mora" and they probably do not fancy sub-Saharan influences. Still, I think here the general absence of data from North Africa may have other reasons: people there may be less prone to take part in Western studies on genetics than, say, people in very linked South Africa. I would imagine there will be much more suspicion. Algeria is dangerous territory. You really have Tunesia and Morocco. The Spaniards may have collected the foreign data from others, as they don't have much money, so they went to look for what is available.

I was trying to look for closest matches for my y-DNA in ysearch and there is almost nothing in Northern Africa. People there have to have some extra dollars and basically speak some English to take part in these studies on their own.

All in all: I think the North African underrepresentation has many causes.

And it is a pity. Someday this has to change.

manju said...
This comment has been removed by the author.
Maju said...

Kepler: you're being totally speculative and borderline offensive. The only thing dangerous in North Africa is the police, but it normally beats the locals and leaves you, the foreign visitor, alone, even if you are behaving outside the law and they know it. That's what our governments pay them for, among other things.

Crossing to Morocco to make all kind of business is extremely easy (certainly a lot easier than traveling to the USA), if Spanish researchers ignore it it's because they choose to. I am sure that there are available samples of at least Mozabites ready for comparison without leaving the sofa. It's not an objective problem but an ideological choice, the same as not sampling France, which should be a focus of European genetic research and is almost systematically ignored.

North Africa has been researched without problems once and again but almost never in studies focused on Europeans, much less Iberians. This is a matter of

And yes, private genetic databanks are only for the well off. Most people, including me, won't pay for that (should be paid by the UNESCO) and most people really don't care at all who their ancestors were beyond a century ago or so (so it should be paid by UNESCO again on the funds expropriated to Goldman Sachs).

That's why academic research is generally much more productive than that private waste of time and money.

Kepler said...

I am talking about ALGERIA IS DANGEROUS TERRITORY. There is no denial about that.

Morocco and Tunesia are not Algeria.
Read again what I wrote.

Maju said...

Algeria doesn't seem really dangerous to me, excepting, as said, the police. Meh, I live in the Basque Country, I know that violent attacks by guerrillas, which are anyhow tiny in the Algerian case, only happen on TV and almost never when you are nearby (much less affect you or anyone you know at all). I think I've heard a bomb exploding twice in my life (and it's already more than 40 years) but I see police violence almost every week.

Anyhow, check the facts: while there have been some attacks and shootings between the Islamists and the also Islamist government (these Islamists are all nuts!) No foreigner has been attacked in Algeria in the last year at least (the article mentions kidnappings but in the Sahel countries, not Algeria).

Is it so easy to fall prey to an stereotype projected by some media? I guess so if you are the kind of person prone to believe in such idiocies and not bothering checking the facts.

Anyhow, assuming Algeria would be difficult (which is not), you can always sample Morocco, Tunisia, etc.

Kepler said...

"I guess so if you are the kind of person prone to believe in such idiocies and not bothering checking the facts. "
I am sure you fall as often as I do for idiocies.

Maju, you are probably meeting Basques and commies only. I live in one of the most Arab cities of Europe. Many friends are from the Maghreb. I read regularly Robert Fisk and others on Algeria and I watch from time to time North African TV in French. I know the situation there probably in the same way as you know your own country (you should know your own country better than that, but that is not my fault).
The simple fact is that there are very few foreigners in Algeria now. I do not want to discuss here whether the attacks that scared them away come from the CIA in Algeria, from Shirley Temple, from the Algerian government, from Ali Baba, from Al Qaeda in Algeria or the Frente para la Liberacion de Gran Canaria or any combination thereof, but there are very few tourists there and that is the reason why they are not getting killed. Before they were attacked (by whoever, that is not the discussion here).

Please, read this if you can:
Very few non-Algerian apart from businessmen with bodyguards and the like go there now.

Maju said...

We are not talking tourists. I don't think Algeria has never been a tourist attraction, unlike Morocco and Tunisia.

We are talking of being able to sample some North Africans and use them as control. In fact those individuals are even already sampled probably anyhow (I already mentioned the Mozabite sample but there is more).

The case is that they are choosing not to compare with them. Every time we spot some potentially North African affinity, be it in Iberians or other Europeans, it is because of tendency towards the Yoruba, which is not the correct control population at all.

It'd be interesting to know if there is some Iberian/European affinity with North Africans. I am quite sure that there is but can't quantify the populations nor the degree with the available data and all because it's some sort of blind spot.

Heraus said...

A much more interesting issue is indeed why France is never sampled and the main reason is actually quite simple : genetic tests are forbidden if not allowed by judges even for scientific purposes. It slows down the whole process.

There is a good and objective reason why genetic tests (more generally owning genetic material) are forbidden. French MPs believe that one cannot dispense justice for oneself. Consequently, genetic tests are forbidden as a whole as they'd be used for paternity tests which would endanger children's legal situations.

And then, there's the only and tacit true reason : jacobine obsession. The French are all supposed to be the same. It would be highly controversial to state that there "might" be regional differences. And since everyone is persuaded that France is homogeneous from Dunkirk to Perpignan, nobody is interested in such issues hence no vocations.

I must say that I hardly get how France, Western Europe's biggest country, the country of Lascaux, can be ignored in pan-European studies. I've noticed similar phenomenons in many scientific fields : since France is not interested in its internal variation, the whole World cannot access data on France and somehow believes that France is uninteresting. In linguistics for instance, disinterest explains why ridiculous concepts such as the Occitan language from Berry to Catalonia have gained international recognition : France doesn't debate about such issues anymore and our neighbours don't get updated news.

One has to live in France to fully "admire" the abstract delirium that jacobinism is.

Maju said...

Thanks for your insightful comment, Heraus. Certainly Jacobinism has some bad reputation over here but I never thought it'd be so bad. It's something we must denounce because the territory of modern France and the ancestors of modern French citizens are critical to understand West European genetics.

As for Occitan, that's another debate but you know I think that Occitan and Catalan are roughly the same, while Gascon is not (it's a different Romance, even if influenced by Occitan). You just need to read some Gascon and Occitan texts to notice the striking differences.

terryt said...

"As for Occitan, that's another debate but you know I think that Occitan and Catalan are roughly the same, while Gascon is not"

I know next to nothing about any of these languages. But I noticed when I visited a museum in Barcelona that the exhibits were labeled in three languages: Castillian, French and Catalan. I did French for four years in school, and Latin for two so I could get the meaning of most exhibits. What I also noticed was that the Catalan was sometimes nearer the French, sometimes nearer castillian and often somewhere between the two. It was a bit of a revelation to me regarding dialect chains.

Anyway, I am not surprised that, 'Catalans appear to have some tendency towards Italy and NW Europe'. Presumably especially toward Southeast France.

Maju said...

"What I also noticed was that the Catalan was sometimes nearer the French, sometimes nearer castillian and often somewhere between the two".

Somewhat yes. In fact it may sound more "like French" than Italian does because of the loss of the last vowel sound, shared by Catalan/Occitan and French, which tend to end words in consonant like English. This little trait adds or takes a lot from intelligibility, IMO. Also Castilian is more evolved in some traits that are kept by other Romances like the initial f- of many words.

However Italian and French are supposedly closer to each other (Gallo-Romance), while Occitan/Catalan is considered either intermediate or Ibero-Romance. One trait that seems exclusive of Ibero-Romances, AFAIK, is the existence of two different verbs for "to be", one with meaning of existence/permanent attribute (Sp. ser) and the other of location/temporary attribute (Sp. estar). This difference also exists in Basque (izan/egon, though there are some differences with neighbor Romances in how it works) but it's lacking in Gallo-Romances (French, Italian), which behave like English in this matter (It. essere, Fr. être).

I'm not sure of whether this trait also exists in Gascon and Occitan but I presume so and hence should surely be considered a Vasco-Iberian substrate influence in the areal Vulgar Latin, and maybe even in Iberian Celtic before it. For what I can see no other IE language (save Greek) has such distinction (nor most other languages, though there are a few: Korean...)

Sadly, I've lost a most interesting link on how German is a dialect continuum (or rather was before homogenization), instead of neatly packed boxes of Low, Middle and High German. This is something we should always consider when analyzing proto-Languages, which were surely dialect continua themselves.

Maju said...

Anyhow, check this recent post on Mediterranean genetics (I had already forgotten!), which is focused on Iberians and North Africans and shows how Catalans, Basques and Cantabrians cluster together, while Andalusians and Asturians [and Occitans? Heraus says they may be "Italian immigrants"] do not. However this other one lacks of either Italian or North European comparisons.

It's a paper made also by researchers from the state of Spain but those are a Catalan team (with a Greek lead researcher and collaborators from Toulouse and Germany), while this one is an Andalusian team with some Madrid members, which may strongly alter the bias.

alex said...

"Crossing to Morocco to make all kind of business is extremely easy (certainly a lot easier than traveling to the USA), if Spanish researchers ignore it it's because they choose to"

why even bother to cross to morocco ?
we have got Manolo, Maria, Mezian and Malika in Ceuta and Melilla !

joe90 kane said...

Although it isn't specifically relevent to this post,
I thought this news might be of interest -
Secrets of ancient Scottish hunters revealed by camp
Herald (Glasgow)
29 May 2010

I hope Leherensuge doesn't mind this intrusion, but I just thought it would want to be kept up to date with this latest in the archaeology of the earliest evidence of human activity in Scotland.

Leherensuge did carry a post on this development when news was announced of this important new dating of human activity in Scotland.

See also -
Archaeological Research in Progress
Calender of Events
Society of Antiquaries of Scotland

all the best

this is such a great blog for all sorts of reasons - keep up the great work

Maju said...

Alex: sure the colonial enclaves in Morocco would have served as sampling pool too, I guess.

Hi, Joe. I think you mean this post.

The article of the Scotland Herald is somewhat interesting but kind I miss something that makes it news-worth or, alternatively more extended detail that could make up for an interesting post. Basically it just mentions again the settlement c. 14,000, suggests that c. 12,000 they would have been forced to retreat by climate and then back (maybe the same people, maybe others) c. 10,000 years ago.

Can't articulate a post on just that. I don't think so, sorry.

Thanks for your interest anyhow.

joe90 kane said...

Thanks Maju.

No, I wasn't asking you to do a blog post on this corporate newspaper article - what do corporate journalists know about anything, never mind paleo-archaeology.

I was just bringing it to your attention for reference.

Sorry if I gave the wrong impression.

all the best

Maju said...

All fine, no problem. :)

aargiedude said...

What I found intriguing and worth posting this graph is the curious coincidence of the scatter of Kenyan Maasai (MKK) and US African-Americans (ASW).

Indeed, it would seem that the most interesting thing from this study are the results of some of their comparison samples, ha ha. But more than a comparison with African Americans, what I found interesting was the big difference between the Maasai of southcenter Kenya and the other Kenyan group, from southwest Kenya close to Lake Victoria. The latter is actually closer to Yorubas (the 3rd African group) than to the Maasai.

I think these Maasai results are extremely interesting. We know that Ethiopians and Somalis cluster about 50%/50% between the Caucasian and sub-Saharan clusters. The Maasai samples are 25% of the distance between Yoruba and CEU. Kenyans live almost exclusively in the southern third of the country, the rest seems to be a desert, which extends into Somalia and Ethiopia. This could be the reason of the sharp genetic cline in this region. The Maasai, living in the northernmost part of the southern third of Kenya, would be the closest to receiving genetic influences from across the desert from Somalis and Ethiopians, resulting in their (probable) 25% genetic clustering in the Caucasian cluster. This would also explain why this closeness to the Caucasian cluster almost disappears in their neighbors in the same country: the geneflow from Ethiopia/Somalia leaves a visible imprint at the mouth of the exchange (the Maasai) but then quickly dillutes itself exponentially as the geneflow advances inwards from there. All of this is an argument for ancient genetic diffusion, as opposed to the argument that the reason Ethiopians and Somalis seem to cluster about 50/50 between the sub-Saharan and Caucasian cluster is due to some recent historic mass invasion/migration or whatever that quite honestly I can't find anything about anywhere. But I'm not an expert on the history of East Africa.

PS: Thanks for noting the other study on Iberians. I have to go see it, now.

Maju said...


"We know that Ethiopians and Somalis cluster about 50%/50% between the Caucasian and sub-Saharan clusters".

Do they? I must admit I'm ignorant on this matter and all the references I can think of refer to Y-DNA and mtDNA, which may well be misleading.

Of course the almost perfect overlap with African Americans, does suggest either admixture with Eurasians or ancient intermediate position. But is this interpretation correct?

There was a paper last year by Bryc et al. on African genetic structure (PDF) that researched West and Central Africa, as well as African Americans and, as control, several European samples. I thought I had blogged on it but seems not.

What is evident from the data (specially fig. 1) is that while Africans have many differences between them, they seem no or only minimally admixed with Europeans. The Fula for example are a typical case of African ethnicity that has been speculated of Mediterranean affinities but Bryc's data clearly demonstrates that, while they are very distinct from other Africans, they show no indication of West Eurasian admixture.

This is also the case (in a different direction) of the various Chadic (Afroasiatic) and Nilo-Saharan (Nilotic) ethnicities researched by Bryc.

PC graphs are tricky and very sensible to which samples are used and their relative weight. In this sense Bryc's paper makes the best of combined use of PC analysis and K-means analysis, not just by using both but also by adding and subtracting populations in them so the comparisons become more clear in what they mean.

However neither Kenya nor any other East African area was sampled by Bryc so the comparison can't be done directly.

Another East African population that showed up as distinct are Mozambicans (Bantus). This appeared almost accidentally when comparing Bantus and Pygmies in Patin 2009 (commented here).

So I can only conclude that the widely acknowledged high genetic diversity of Africa has some very intriguing and complex structure that we have only begun understanding. I doubt that most of it has anything to do with Eurasian admixture of any sort but I'm really feeling hungry for a more widely continental comprehensive research. K=16 at least! :)