Cargando…

Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo

MOTIVATION: In recent years, molecular species delimitation has become a routine approach for quantifying and classifying biodiversity. Barcoding methods are of particular importance in large-scale surveys as they promote fast species discovery and biodiversity estimates. Among those, distance-based...

Descripción completa

Detalles Bibliográficos
Autores principales: Kapli, P, Lutteropp, S, Zhang, J, Kobert, K, Pavlidis, P, Stamatakis, A, Flouri, T
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5447239/
https://www.ncbi.nlm.nih.gov/pubmed/28108445
http://dx.doi.org/10.1093/bioinformatics/btx025
_version_ 1783239289589465088
author Kapli, P
Lutteropp, S
Zhang, J
Kobert, K
Pavlidis, P
Stamatakis, A
Flouri, T
author_facet Kapli, P
Lutteropp, S
Zhang, J
Kobert, K
Pavlidis, P
Stamatakis, A
Flouri, T
author_sort Kapli, P
collection PubMed
description MOTIVATION: In recent years, molecular species delimitation has become a routine approach for quantifying and classifying biodiversity. Barcoding methods are of particular importance in large-scale surveys as they promote fast species discovery and biodiversity estimates. Among those, distance-based methods are the most common choice as they scale well with large datasets; however, they are sensitive to similarity threshold parameters and they ignore evolutionary relationships. The recently introduced “Poisson Tree Processes” (PTP) method is a phylogeny-aware approach that does not rely on such thresholds. Yet, two weaknesses of PTP impact its accuracy and practicality when applied to large datasets; it does not account for divergent intraspecific variation and is slow for a large number of sequences. RESULTS: We introduce the multi-rate PTP (mPTP), an improved method that alleviates the theoretical and technical shortcomings of PTP. It incorporates different levels of intraspecific genetic diversity deriving from differences in either the evolutionary history or sampling of each species. Results on empirical data suggest that mPTP is superior to PTP and popular distance-based methods as it, consistently yields more accurate delimitations with respect to the taxonomy (i.e., identifies more taxonomic species, infers species numbers closer to the taxonomy). Moreover, mPTP does not require any similarity threshold as input. The novel dynamic programming algorithm attains a speedup of at least five orders of magnitude compared to PTP, allowing it to delimit species in large (meta-) barcoding data. In addition, Markov Chain Monte Carlo sampling provides a comprehensive evaluation of the inferred delimitation in just a few seconds for millions of steps, independently of tree size. AVAILABILITY AND IMPLEMENTATION: mPTP is implemented in C and is available for download at http://github.com/Pas-Kapli/mptp under the GNU Affero 3 license. A web-service is available at http://mptp.h-its.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5447239
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54472392017-05-31 Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo Kapli, P Lutteropp, S Zhang, J Kobert, K Pavlidis, P Stamatakis, A Flouri, T Bioinformatics Original Papers MOTIVATION: In recent years, molecular species delimitation has become a routine approach for quantifying and classifying biodiversity. Barcoding methods are of particular importance in large-scale surveys as they promote fast species discovery and biodiversity estimates. Among those, distance-based methods are the most common choice as they scale well with large datasets; however, they are sensitive to similarity threshold parameters and they ignore evolutionary relationships. The recently introduced “Poisson Tree Processes” (PTP) method is a phylogeny-aware approach that does not rely on such thresholds. Yet, two weaknesses of PTP impact its accuracy and practicality when applied to large datasets; it does not account for divergent intraspecific variation and is slow for a large number of sequences. RESULTS: We introduce the multi-rate PTP (mPTP), an improved method that alleviates the theoretical and technical shortcomings of PTP. It incorporates different levels of intraspecific genetic diversity deriving from differences in either the evolutionary history or sampling of each species. Results on empirical data suggest that mPTP is superior to PTP and popular distance-based methods as it, consistently yields more accurate delimitations with respect to the taxonomy (i.e., identifies more taxonomic species, infers species numbers closer to the taxonomy). Moreover, mPTP does not require any similarity threshold as input. The novel dynamic programming algorithm attains a speedup of at least five orders of magnitude compared to PTP, allowing it to delimit species in large (meta-) barcoding data. In addition, Markov Chain Monte Carlo sampling provides a comprehensive evaluation of the inferred delimitation in just a few seconds for millions of steps, independently of tree size. AVAILABILITY AND IMPLEMENTATION: mPTP is implemented in C and is available for download at http://github.com/Pas-Kapli/mptp under the GNU Affero 3 license. A web-service is available at http://mptp.h-its.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-06-01 2017-01-20 /pmc/articles/PMC5447239/ /pubmed/28108445 http://dx.doi.org/10.1093/bioinformatics/btx025 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Kapli, P
Lutteropp, S
Zhang, J
Kobert, K
Pavlidis, P
Stamatakis, A
Flouri, T
Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo
title Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo
title_full Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo
title_fullStr Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo
title_full_unstemmed Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo
title_short Multi-rate Poisson tree processes for single-locus species delimitation under maximum likelihood and Markov chain Monte Carlo
title_sort multi-rate poisson tree processes for single-locus species delimitation under maximum likelihood and markov chain monte carlo
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5447239/
https://www.ncbi.nlm.nih.gov/pubmed/28108445
http://dx.doi.org/10.1093/bioinformatics/btx025
work_keys_str_mv AT kaplip multiratepoissontreeprocessesforsinglelocusspeciesdelimitationundermaximumlikelihoodandmarkovchainmontecarlo
AT lutteropps multiratepoissontreeprocessesforsinglelocusspeciesdelimitationundermaximumlikelihoodandmarkovchainmontecarlo
AT zhangj multiratepoissontreeprocessesforsinglelocusspeciesdelimitationundermaximumlikelihoodandmarkovchainmontecarlo
AT kobertk multiratepoissontreeprocessesforsinglelocusspeciesdelimitationundermaximumlikelihoodandmarkovchainmontecarlo
AT pavlidisp multiratepoissontreeprocessesforsinglelocusspeciesdelimitationundermaximumlikelihoodandmarkovchainmontecarlo
AT stamatakisa multiratepoissontreeprocessesforsinglelocusspeciesdelimitationundermaximumlikelihoodandmarkovchainmontecarlo
AT flourit multiratepoissontreeprocessesforsinglelocusspeciesdelimitationundermaximumlikelihoodandmarkovchainmontecarlo