New blogs

Leherensuge was replaced in October 2010 by two new blogs: For what they were... we are and For what we are... they will be. Check them out.

Saturday, July 31, 2010

Korean genetics: between China and Japan with some unique personality


New paper on Korean genetics:


Jonshung Jung, Hoyoung Kang et al. Gene Flow between the Korean Peninsula and Its Neighboring Countries. PLoS ONE 2010. Open access.

The most relevant results are probably in these graphs that follow:

Figure 3: Genetic Structure

Legend and details for the various samples can be seen in figure 2. Notice please that CB (Cambodian) is mistakenly placed in Northern Asia, when it should be in Southern Asia (i.e. SE Asia) along with Vietnamese (VN) and Vietnamese-Koreans (VC probably though elsewhere tagged as VK).

Notice also that there are two K=5 in the global cluster analysis (C), one of them marked as RH, what means "recombination hotspot", which is a technical albeit interesting matter they deal with in extent in the supplementary materials. The conclusion seems to be that these recombination hotspots can induce distortions in the cluster analysis and that should be dealt with in order to prevent confusing results.

Anyhow the most valid run for the global dataset is surely K=4, showing four neatly distinct clusters: Africans, Europeans (or West Eurasians), Amerindians and East Asians. The minor "admixture" apparent levels among Amerindians and some East Asians (Mongolians, Cambodians) may or not mean admixture. In my opinion, based on comparison with other different studies, I think it does not but rather indicates a lesser degree of affinity with the main cluster and therefore some small degree of affinity with other Eurasians. Careful sampling strategies, such as the one done by Hui Li last year may be needed to discern what exactly means, if anything at all.

Most probably the "European affinity" apparent in these two cases just means some Central Eurasian (or Siberian) affinity of Mongols, as shown in Hui Li's paper, as well as in Amerindians, and some South Asian affinity of Cambodians as detected for neighboring Thais in the recent paper by Jinchuang Xing.

When comparing East Asians alone (B), three clusters are apparent (K=3): a "Mongolian" one (blue), a SE Asian one (green), more intense in Cambodians than Vietnamese, and the middle East Asian one including Chinese, Korean and Japanese (and largely Vietnamese too). K=4 seems to indicate a diffuse (low level) Japanese-Korean affinity but this is better seen in the middle East Asian comparison.

In panel A (with only Chinese, Koreans and Japanese), we can see three clusters again (K=3): Chinese (green), Japanese (red) and Korean (blue). However most Koreans are not clearly differentiated from their neighbors, specially Chinese, and only the insular population of Jeju shows a much stronger Korean-specific homogeneity (though still with some Japanese influence).

Figure 4: MDS and NJ Tree of Korean, Chinese and Japanese


MDS stands for multidimensional scaling, a way of visually presenting statistical data in two or more dimensions, somewhat similar to PCA. NJ stands for neighbor-joining, a method used to build affinity trees in genetics that you are probably familiar with.

According to the legend, in the MDS plot (A) the 1st (horizontal) dimension includes 90% of the variance, while the 2nd (vertical) dimension only 1%! It could perfectly be a linear plot with most East Asians clustering to the left and a small odd group to the right. This "odd group" is mostly made of SE Koreans, plus three Japanese Koreans (from Kobe) and one Kobe Japanese.

However this must be an error of the legend and the vertical axis is without doubt the 1st dimension. Notice the scale, which is of 1/1000 order of magnitude in the horizontal axis and only 1/10 in the vertical one. Notice also how a China-Korea-Japan, with some eccentricity for Korea, is fully consistent with all other data provided.

The NJ tree indicates an apparent first divergence in three branches:

1. A Japanese-only branch (most Japanese fall here)
2. A mostly Korean branch, including also one Japanese and three Chinese
3. The major branch including most Koreans and Chinese, as well as two Japanese. This one, in turn splits in:

3a. Including the two remnant Japanese and all the rest being Koreans.
3b. A large branch including only Koreans and Chinese. It divides in two:

3b1. Almost only Koreans (only one Chinese here I think), with high incidence of SW Koreans.
3b2. Many Koreans too (with high incidence of SE Koreans) plus most Chinese, concentrated in a particular sub-branch (bottom of graph).

Caution must be placed to Korean-centric readings in any case because Koreans are clearly oversampled, what is surely distorting the tree structure to at least some extent. However if this structure could find confirmation in further more balanced studies, it might well support a colonization process along the coast, which is pretty much mainstream these days.

Koreans anyhow mostly cluster with Chinese here too, which is consistent with the STRUCTURE analysis, showing some SW-SE (West-East?) internal polarity in the peninsula.

Notice that no sampling was undertaken, without doubt because of the political circumstances, in the northern half of the peninsula. Notice also that both Chinese samples are from the North (Beijing and Manchuria).

3 comments:

terryt said...

"However if this structure could find confirmation in further more balanced studies, it might well support a colonization process along the coast, which is pretty much mainstream these days".

But in which direction? As you say:

"The NJ tree indicates an apparent first divergence in three branches:

1. A Japanese-only branch (most Japanese fall here)
2. A mostly Korean branch, including also one Japanese and three Chinese
3. The major branch including most Koreans and Chinese, as well as two Japanese. This one, in turn splits in:

3a. Including the two remnant Japanese and all the rest being Koreans.
3b. A large branch including only Koreans and Chinese. It divides in two:

3b1. Almost only Koreans (only one Chinese here I think), with high incidence of SW Koreans.
3b2. Many Koreans too (with high incidence of SE Koreans) plus most Chinese, concentrated in a particular sub-branch (bottom of graph)".

So 'most Chinese' appear in the downstream division 3b2, whereas the first three-way split is 'A Japanese-only branch (most Japanese fall here)', 'A mostly Korean branch' and a third 'branch including most Koreans and Chinese'. It looks very much as though the Chinese (northern ones at least) spread from the north in a southerly direction.

Maju said...

Well, if you look at a map (for instance the map included in the paper), you'll see that Japan, Korea and North China are at the same latitude.

What I noticed was that of the three apparent top level branches two are exclusively coastal (Japan, Korea), and only the third is shared with the interior (China). But take this with a good dose of salt because of the already mentioned Korean oversampling: this is autosomal genetics, not haploid lineages - here sample size may matter a lot!

Maju said...

PS. Actually the Chinese sample sties are North (and West) relative to the Korean and Japanese ones.