Cargando…

A performance study of the impact of recombination on species tree analysis

BACKGROUND: The most widely used state-of-the-art methods for reconstructing species phylogenies from genomic sequence data assume that sampled loci are identically and independently distributed. In principle, free recombination between loci and a lack of intra-locus recombination are necessary to s...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Zhiwei, Liu, Kevin J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123380/
https://www.ncbi.nlm.nih.gov/pubmed/28185556
http://dx.doi.org/10.1186/s12864-016-3104-5
_version_ 1782469724429352960
author Wang, Zhiwei
Liu, Kevin J.
author_facet Wang, Zhiwei
Liu, Kevin J.
author_sort Wang, Zhiwei
collection PubMed
description BACKGROUND: The most widely used state-of-the-art methods for reconstructing species phylogenies from genomic sequence data assume that sampled loci are identically and independently distributed. In principle, free recombination between loci and a lack of intra-locus recombination are necessary to satisfy this assumption. Few studies have quantified the practical impact of recombination on species tree inference methods, and even fewer have used genomic sequence data for this purpose. One prominent exception is the 2012 study of Lanier and Knowles. A main finding from the study was that species tree inference methods are relatively robust to intra-locus recombination, assuming free recombination between loci. The latter assumption means that the open question regarding the impact of recombination on species tree analysis is not fully resolved. RESULTS: The goal of this study is to further investigate this open question. Using simulations based upon the multi-species coalescent-with-recombination model as well as empirical datasets, we compared common pipeline-based techniques for inferring species phylogenies. The simulation conditions included a range of dataset sizes and several choices for recombination rate which was either uniform across loci or incorporated recombination hotspots. We found that pipelines which explicitly utilize inferred recombination breakpoints to delineate recombination-free intervals result in greater accuracy compared to widely used alternatives that preprocess sequences based upon linkage disequilibrium decay. Furthermore, the use of a relatively simple approach for recombination breakpoint inference does not degrade the accuracy of downstream species tree inference compared to more accurate alternatives. CONCLUSIONS: Our findings clarify the impact of recombination upon current phylogenomic pipelines for species tree inference. Pipeline-based approaches which utilize inferred recombination breakpoints to densely sample loci across genomic sequences can tolerate intra-locus recombination and violations of the assumption of free recombination between loci. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3104-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5123380
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51233802016-12-08 A performance study of the impact of recombination on species tree analysis Wang, Zhiwei Liu, Kevin J. BMC Genomics Research BACKGROUND: The most widely used state-of-the-art methods for reconstructing species phylogenies from genomic sequence data assume that sampled loci are identically and independently distributed. In principle, free recombination between loci and a lack of intra-locus recombination are necessary to satisfy this assumption. Few studies have quantified the practical impact of recombination on species tree inference methods, and even fewer have used genomic sequence data for this purpose. One prominent exception is the 2012 study of Lanier and Knowles. A main finding from the study was that species tree inference methods are relatively robust to intra-locus recombination, assuming free recombination between loci. The latter assumption means that the open question regarding the impact of recombination on species tree analysis is not fully resolved. RESULTS: The goal of this study is to further investigate this open question. Using simulations based upon the multi-species coalescent-with-recombination model as well as empirical datasets, we compared common pipeline-based techniques for inferring species phylogenies. The simulation conditions included a range of dataset sizes and several choices for recombination rate which was either uniform across loci or incorporated recombination hotspots. We found that pipelines which explicitly utilize inferred recombination breakpoints to delineate recombination-free intervals result in greater accuracy compared to widely used alternatives that preprocess sequences based upon linkage disequilibrium decay. Furthermore, the use of a relatively simple approach for recombination breakpoint inference does not degrade the accuracy of downstream species tree inference compared to more accurate alternatives. CONCLUSIONS: Our findings clarify the impact of recombination upon current phylogenomic pipelines for species tree inference. Pipeline-based approaches which utilize inferred recombination breakpoints to densely sample loci across genomic sequences can tolerate intra-locus recombination and violations of the assumption of free recombination between loci. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3104-5) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-11 /pmc/articles/PMC5123380/ /pubmed/28185556 http://dx.doi.org/10.1186/s12864-016-3104-5 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Wang, Zhiwei
Liu, Kevin J.
A performance study of the impact of recombination on species tree analysis
title A performance study of the impact of recombination on species tree analysis
title_full A performance study of the impact of recombination on species tree analysis
title_fullStr A performance study of the impact of recombination on species tree analysis
title_full_unstemmed A performance study of the impact of recombination on species tree analysis
title_short A performance study of the impact of recombination on species tree analysis
title_sort performance study of the impact of recombination on species tree analysis
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123380/
https://www.ncbi.nlm.nih.gov/pubmed/28185556
http://dx.doi.org/10.1186/s12864-016-3104-5
work_keys_str_mv AT wangzhiwei aperformancestudyoftheimpactofrecombinationonspeciestreeanalysis
AT liukevinj aperformancestudyoftheimpactofrecombinationonspeciestreeanalysis
AT wangzhiwei performancestudyoftheimpactofrecombinationonspeciestreeanalysis
AT liukevinj performancestudyoftheimpactofrecombinationonspeciestreeanalysis