Cargando…

GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes

Large-scale metagenomic datasets enable the recovery of hundreds of population genomes from environmental samples. However, these genomes do not typically represent the full diversity of complex microbial communities. Gene-centric approaches can be used to gain a comprehensive view of diversity by e...

Descripción completa

Detalles Bibliográficos
Autores principales: Boyd, Joel A, Woodcroft, Ben J, Tyson, Gene W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007438/
https://www.ncbi.nlm.nih.gov/pubmed/29562347
http://dx.doi.org/10.1093/nar/gky174
_version_ 1783333036623921152
author Boyd, Joel A
Woodcroft, Ben J
Tyson, Gene W
author_facet Boyd, Joel A
Woodcroft, Ben J
Tyson, Gene W
author_sort Boyd, Joel A
collection PubMed
description Large-scale metagenomic datasets enable the recovery of hundreds of population genomes from environmental samples. However, these genomes do not typically represent the full diversity of complex microbial communities. Gene-centric approaches can be used to gain a comprehensive view of diversity by examining each read independently, but traditional pairwise comparison approaches typically over-classify taxonomy and scale poorly with increasing metagenome and database sizes. Here we introduce GraftM, a tool that uses gene specific packages to rapidly identify gene families in metagenomic data using hidden Markov models (HMMs) or DIAMOND databases, and classifies these sequences using placement into pre-constructed gene trees. The speed and accuracy of GraftM was benchmarked with in silico and in vitro mock communities using taxonomic markers, and was found to have higher accuracy at the family level with a processing time 2.0–3.7× faster than currently available software. Exploration of a wetland metagenome using 16S rRNA- and methyl-coenzyme M reductase (McrA)-specific gpkgs revealed taxonomic and functional shifts across a depth gradient. Analysis of the NCBI nr database using the McrA gpkg allowed the detection of novel sequences belonging to phylum-level lineages. A growing collection of gpkgs is available online (https://github.com/geronimp/graftM_gpkgs), where curated packages can be uploaded and exchanged.
format Online
Article
Text
id pubmed-6007438
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60074382018-07-05 GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes Boyd, Joel A Woodcroft, Ben J Tyson, Gene W Nucleic Acids Res Methods Online Large-scale metagenomic datasets enable the recovery of hundreds of population genomes from environmental samples. However, these genomes do not typically represent the full diversity of complex microbial communities. Gene-centric approaches can be used to gain a comprehensive view of diversity by examining each read independently, but traditional pairwise comparison approaches typically over-classify taxonomy and scale poorly with increasing metagenome and database sizes. Here we introduce GraftM, a tool that uses gene specific packages to rapidly identify gene families in metagenomic data using hidden Markov models (HMMs) or DIAMOND databases, and classifies these sequences using placement into pre-constructed gene trees. The speed and accuracy of GraftM was benchmarked with in silico and in vitro mock communities using taxonomic markers, and was found to have higher accuracy at the family level with a processing time 2.0–3.7× faster than currently available software. Exploration of a wetland metagenome using 16S rRNA- and methyl-coenzyme M reductase (McrA)-specific gpkgs revealed taxonomic and functional shifts across a depth gradient. Analysis of the NCBI nr database using the McrA gpkg allowed the detection of novel sequences belonging to phylum-level lineages. A growing collection of gpkgs is available online (https://github.com/geronimp/graftM_gpkgs), where curated packages can be uploaded and exchanged. Oxford University Press 2018-06-01 2018-03-19 /pmc/articles/PMC6007438/ /pubmed/29562347 http://dx.doi.org/10.1093/nar/gky174 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Boyd, Joel A
Woodcroft, Ben J
Tyson, Gene W
GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes
title GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes
title_full GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes
title_fullStr GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes
title_full_unstemmed GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes
title_short GraftM: a tool for scalable, phylogenetically informed classification of genes within metagenomes
title_sort graftm: a tool for scalable, phylogenetically informed classification of genes within metagenomes
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007438/
https://www.ncbi.nlm.nih.gov/pubmed/29562347
http://dx.doi.org/10.1093/nar/gky174
work_keys_str_mv AT boydjoela graftmatoolforscalablephylogeneticallyinformedclassificationofgeneswithinmetagenomes
AT woodcroftbenj graftmatoolforscalablephylogeneticallyinformedclassificationofgeneswithinmetagenomes
AT tysongenew graftmatoolforscalablephylogeneticallyinformedclassificationofgeneswithinmetagenomes