Cargando…

PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes

Background. Metagenomics is an approach for characterizing environmental microbial communities in situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. This is often achieved by a combination of sequence assembly and binning, where sequences a...

Descripción completa

Detalles Bibliográficos
Autores principales: Gregor, Ivan, Dröge, Johannes, Schirmer, Melanie, Quince, Christopher, McHardy, Alice C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4748697/
https://www.ncbi.nlm.nih.gov/pubmed/26870609
http://dx.doi.org/10.7717/peerj.1603
_version_ 1782415171384246272
author Gregor, Ivan
Dröge, Johannes
Schirmer, Melanie
Quince, Christopher
McHardy, Alice C.
author_facet Gregor, Ivan
Dröge, Johannes
Schirmer, Melanie
Quince, Christopher
McHardy, Alice C.
author_sort Gregor, Ivan
collection PubMed
description Background. Metagenomics is an approach for characterizing environmental microbial communities in situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. This is often achieved by a combination of sequence assembly and binning, where sequences are grouped into ‘bins’ representing taxa of the underlying microbial community. Assignment to low-ranking taxonomic bins is an important challenge for binning methods as is scalability to Gb-sized datasets generated with deep sequencing techniques. One of the best available methods for species bins recovery from deep-branching phyla is the expert-trained PhyloPythiaS package, where a human expert decides on the taxa to incorporate in the model and identifies ‘training’ sequences based on marker genes directly from the sample. Due to the manual effort involved, this approach does not scale to multiple metagenome samples and requires substantial expertise, which researchers who are new to the area do not have. Results. We have developed PhyloPythiaS+, a successor to our PhyloPythia(S) software. The new (+) component performs the work previously done by the human expert. PhyloPythiaS+ also includes a new k-mer counting algorithm, which accelerated the simultaneous counting of 4–6-mers used for taxonomic binning 100-fold and reduced the overall execution time of the software by a factor of three. Our software allows to analyze Gb-sized metagenomes with inexpensive hardware, and to recover species or genera-level bins with low error rates in a fully automated fashion. PhyloPythiaS+ was compared to MEGAN, taxator-tk, Kraken and the generic PhyloPythiaS model. The results showed that PhyloPythiaS+ performs especially well for samples originating from novel environments in comparison to the other methods. Availability. PhyloPythiaS+ in a virtual machine is available for installation under Windows, Unix systems or OS X on: https://github.com/algbioi/ppsp/wiki.
format Online
Article
Text
id pubmed-4748697
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-47486972016-02-11 PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes Gregor, Ivan Dröge, Johannes Schirmer, Melanie Quince, Christopher McHardy, Alice C. PeerJ Bioinformatics Background. Metagenomics is an approach for characterizing environmental microbial communities in situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. This is often achieved by a combination of sequence assembly and binning, where sequences are grouped into ‘bins’ representing taxa of the underlying microbial community. Assignment to low-ranking taxonomic bins is an important challenge for binning methods as is scalability to Gb-sized datasets generated with deep sequencing techniques. One of the best available methods for species bins recovery from deep-branching phyla is the expert-trained PhyloPythiaS package, where a human expert decides on the taxa to incorporate in the model and identifies ‘training’ sequences based on marker genes directly from the sample. Due to the manual effort involved, this approach does not scale to multiple metagenome samples and requires substantial expertise, which researchers who are new to the area do not have. Results. We have developed PhyloPythiaS+, a successor to our PhyloPythia(S) software. The new (+) component performs the work previously done by the human expert. PhyloPythiaS+ also includes a new k-mer counting algorithm, which accelerated the simultaneous counting of 4–6-mers used for taxonomic binning 100-fold and reduced the overall execution time of the software by a factor of three. Our software allows to analyze Gb-sized metagenomes with inexpensive hardware, and to recover species or genera-level bins with low error rates in a fully automated fashion. PhyloPythiaS+ was compared to MEGAN, taxator-tk, Kraken and the generic PhyloPythiaS model. The results showed that PhyloPythiaS+ performs especially well for samples originating from novel environments in comparison to the other methods. Availability. PhyloPythiaS+ in a virtual machine is available for installation under Windows, Unix systems or OS X on: https://github.com/algbioi/ppsp/wiki. PeerJ Inc. 2016-02-08 /pmc/articles/PMC4748697/ /pubmed/26870609 http://dx.doi.org/10.7717/peerj.1603 Text en ©2016 Gregor et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Gregor, Ivan
Dröge, Johannes
Schirmer, Melanie
Quince, Christopher
McHardy, Alice C.
PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes
title PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes
title_full PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes
title_fullStr PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes
title_full_unstemmed PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes
title_short PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes
title_sort phylopythias+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4748697/
https://www.ncbi.nlm.nih.gov/pubmed/26870609
http://dx.doi.org/10.7717/peerj.1603
work_keys_str_mv AT gregorivan phylopythiasaselftrainingmethodfortherapidreconstructionoflowrankingtaxonomicbinsfrommetagenomes
AT drogejohannes phylopythiasaselftrainingmethodfortherapidreconstructionoflowrankingtaxonomicbinsfrommetagenomes
AT schirmermelanie phylopythiasaselftrainingmethodfortherapidreconstructionoflowrankingtaxonomicbinsfrommetagenomes
AT quincechristopher phylopythiasaselftrainingmethodfortherapidreconstructionoflowrankingtaxonomicbinsfrommetagenomes
AT mchardyalicec phylopythiasaselftrainingmethodfortherapidreconstructionoflowrankingtaxonomicbinsfrommetagenomes