Cargando…
A performance study of the impact of recombination on species tree analysis
BACKGROUND: The most widely used state-of-the-art methods for reconstructing species phylogenies from genomic sequence data assume that sampled loci are identically and independently distributed. In principle, free recombination between loci and a lack of intra-locus recombination are necessary to s...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123380/ https://www.ncbi.nlm.nih.gov/pubmed/28185556 http://dx.doi.org/10.1186/s12864-016-3104-5 |
_version_ | 1782469724429352960 |
---|---|
author | Wang, Zhiwei Liu, Kevin J. |
author_facet | Wang, Zhiwei Liu, Kevin J. |
author_sort | Wang, Zhiwei |
collection | PubMed |
description | BACKGROUND: The most widely used state-of-the-art methods for reconstructing species phylogenies from genomic sequence data assume that sampled loci are identically and independently distributed. In principle, free recombination between loci and a lack of intra-locus recombination are necessary to satisfy this assumption. Few studies have quantified the practical impact of recombination on species tree inference methods, and even fewer have used genomic sequence data for this purpose. One prominent exception is the 2012 study of Lanier and Knowles. A main finding from the study was that species tree inference methods are relatively robust to intra-locus recombination, assuming free recombination between loci. The latter assumption means that the open question regarding the impact of recombination on species tree analysis is not fully resolved. RESULTS: The goal of this study is to further investigate this open question. Using simulations based upon the multi-species coalescent-with-recombination model as well as empirical datasets, we compared common pipeline-based techniques for inferring species phylogenies. The simulation conditions included a range of dataset sizes and several choices for recombination rate which was either uniform across loci or incorporated recombination hotspots. We found that pipelines which explicitly utilize inferred recombination breakpoints to delineate recombination-free intervals result in greater accuracy compared to widely used alternatives that preprocess sequences based upon linkage disequilibrium decay. Furthermore, the use of a relatively simple approach for recombination breakpoint inference does not degrade the accuracy of downstream species tree inference compared to more accurate alternatives. CONCLUSIONS: Our findings clarify the impact of recombination upon current phylogenomic pipelines for species tree inference. Pipeline-based approaches which utilize inferred recombination breakpoints to densely sample loci across genomic sequences can tolerate intra-locus recombination and violations of the assumption of free recombination between loci. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3104-5) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5123380 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-51233802016-12-08 A performance study of the impact of recombination on species tree analysis Wang, Zhiwei Liu, Kevin J. BMC Genomics Research BACKGROUND: The most widely used state-of-the-art methods for reconstructing species phylogenies from genomic sequence data assume that sampled loci are identically and independently distributed. In principle, free recombination between loci and a lack of intra-locus recombination are necessary to satisfy this assumption. Few studies have quantified the practical impact of recombination on species tree inference methods, and even fewer have used genomic sequence data for this purpose. One prominent exception is the 2012 study of Lanier and Knowles. A main finding from the study was that species tree inference methods are relatively robust to intra-locus recombination, assuming free recombination between loci. The latter assumption means that the open question regarding the impact of recombination on species tree analysis is not fully resolved. RESULTS: The goal of this study is to further investigate this open question. Using simulations based upon the multi-species coalescent-with-recombination model as well as empirical datasets, we compared common pipeline-based techniques for inferring species phylogenies. The simulation conditions included a range of dataset sizes and several choices for recombination rate which was either uniform across loci or incorporated recombination hotspots. We found that pipelines which explicitly utilize inferred recombination breakpoints to delineate recombination-free intervals result in greater accuracy compared to widely used alternatives that preprocess sequences based upon linkage disequilibrium decay. Furthermore, the use of a relatively simple approach for recombination breakpoint inference does not degrade the accuracy of downstream species tree inference compared to more accurate alternatives. CONCLUSIONS: Our findings clarify the impact of recombination upon current phylogenomic pipelines for species tree inference. Pipeline-based approaches which utilize inferred recombination breakpoints to densely sample loci across genomic sequences can tolerate intra-locus recombination and violations of the assumption of free recombination between loci. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3104-5) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-11 /pmc/articles/PMC5123380/ /pubmed/28185556 http://dx.doi.org/10.1186/s12864-016-3104-5 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Wang, Zhiwei Liu, Kevin J. A performance study of the impact of recombination on species tree analysis |
title | A performance study of the impact of recombination on species tree analysis |
title_full | A performance study of the impact of recombination on species tree analysis |
title_fullStr | A performance study of the impact of recombination on species tree analysis |
title_full_unstemmed | A performance study of the impact of recombination on species tree analysis |
title_short | A performance study of the impact of recombination on species tree analysis |
title_sort | performance study of the impact of recombination on species tree analysis |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123380/ https://www.ncbi.nlm.nih.gov/pubmed/28185556 http://dx.doi.org/10.1186/s12864-016-3104-5 |
work_keys_str_mv | AT wangzhiwei aperformancestudyoftheimpactofrecombinationonspeciestreeanalysis AT liukevinj aperformancestudyoftheimpactofrecombinationonspeciestreeanalysis AT wangzhiwei performancestudyoftheimpactofrecombinationonspeciestreeanalysis AT liukevinj performancestudyoftheimpactofrecombinationonspeciestreeanalysis |