There's a new paper of some interest on Spanish population genetic structure:
J. Gayan et al. Genetic Structure of the Spanish Population. BMC Genomics. Open access.
The main aim seems to be to provide a Spanish dataset for population genetic research. And so far so good.
However the sampling strategy is awkward to say the least:
In the above map (my creation on the paper's data), red dots indicate sampling locations, while blue areas are regions (autonomous communities) not included in the sample. Most noticeable is that not just the Basque Country has been excluded but also all the surrounding area.
As I say, quite awkward.
Other surely distinctive unsampled areas are the Canary Islands and Galicia.
The whole design of the sample has a Castile-centric bias that is difficult to understand.
But, well, that's what they did. And, once we know that, we can go on to look at the results:
In this graph (fig. 6 annotated by me) we can see how Catalans and Andalusians tend to diverge from neutrality in orthogonal directions. To a less clear extent, the North Castilian samples (Arévalo, Segovia) also diverges somewhat.
We can say that PC1 describes a Catalan-other axis and PC2 an Andalusian-other one. The lack of distinctiveness of some geographically eccentric samples such as the Asturian one (Avilés)may well be caused by the small size of the sample. It is very possible that a PC3/PC4 graph would evidence some distinctiveness that is not apparent here.
Remember that PC graphs are merely bidimensional representations of some of the apparent structure, with all the limitations that this implies.
When compared with other populations of European ancestry the PC graph is as follows (annotations by me on fig. 7):
Catalans appear to have some tendency towards Italy and NW Europe, while the less defined eccentricity of Andalusians only seems to tend towards Italy. There are also a couple of Castilian individuals who cluster with NW Europeans, maybe because the North Castile area sampled was the core of Visigothic settlement (just a hunch).
There's not much to say about Spaniards in the global scatterplot (fig. 8), really: all Europeans just cluster very tightly, the same as East Asians (Chinese and Japanese).
What I found intriguing and worth posting this graph is the curious coincidence of the scatter of Kenyan Maasai (MKK) and US African-Americans (ASW). Notice that the LWK sample (Luhya) are also from Kenya but Bantu and they cluster best with Nigerian Yoruba (YRI). Not really sure because it'd need further research but certainly the almost identical disposition of MKK and ASW samples is suggestive of the Maasai (and maybe other Nilotes) being somewhat admixed with West Eurasians. Alternatively the distribution might be reflecting some African-specific differences (just like Indians in the Eurasian context) and its overlap with African-Americans is to some extent an artifact of the limitations of PC analysis.
I do miss a comparison with North Africans, which seems to be a taboo in Iberian and European population genetic studies. However I do detect (and not only here) a slight "African" tendency among some Iberians which may well reflect a greater affinity with North Africans, in turn slightly more akin to ultra-Saharan Africans. This would be an interesting matter to analyze.
Update: I totally forgot to mention that just a few days ago I commented on another paper by a Catalan team that did compare Iberians and North Africans, which may serve for comparison. The results appear wildly different in the PC graph, with Catalans, Basques and Cantabrians clustering on one corner and the other Iberians scattered with a clear tendency towards the Eastern Mediterranean. No apparent North African affinity was detected.