Bryan Kolaczkowski & Joseph W. Thornton, Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics. PLoS ONE, 2009. Open access.
Abstract
Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias—which is apparent under both controlled simulation conditions and in analyses of empirical sequence data—also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages—that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis.
Bold type is mine. I think that the paper needs no further comment, at least I can't think of anything right now.
4 comments:
Long-branch attraction is probably what's wrong with both mtDNA and Y-DNA trees: African-specific lineages (Y-A, Y-B, mt-L1, mt-L2, etc.) evolved at a faster rate and created homoplasies that look like ancient synapomorphies.
This is different, German. It applies to autosomal DNA. Haploid DNA has its own rules, based not in overall affinity but specific and hierarchically organized SNPs.
Why would that be different? Long branch attraction is a bias that spans any genetic system (A, G, T, C are the only 4 options everywhere), plus, to a varying degree, affects all the computerized phylogenetic methods. from Max Like to Bayesian.
Because you don't use Bayesian approximations to build such trees. You don't need them.
You only use such statistical approximations to deal with huge amount of distributed data, that is not manageable by any other means. That happens with autosomal or whole genome data only, not with the limited ammount of haploid SNPs that you can deal with "manually".
The haploid phylogenetic trees are not statistical but factual, as far as research can tell. You claim to know of genetics but you don't know a shit. There's nothing statistical but samples in haploid phylogenetics.
Post a Comment