Cargando…

Rapid identification of high-confidence taxonomic assignments for metagenomic data

Determining the taxonomic lineage of DNA sequences is an important step in metagenomic analysis. Short DNA fragments from next-generation sequencing projects and microbes that lack close relatives in reference sequenced genome databases pose significant problems to taxonomic attribution methods. Our...

Descripción completa

Detalles Bibliográficos
Autores principales: MacDonald, Norman J., Parks, Donovan H., Beiko, Robert G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3413139/
https://www.ncbi.nlm.nih.gov/pubmed/22532608
http://dx.doi.org/10.1093/nar/gks335
_version_ 1782240033488502784
author MacDonald, Norman J.
Parks, Donovan H.
Beiko, Robert G.
author_facet MacDonald, Norman J.
Parks, Donovan H.
Beiko, Robert G.
author_sort MacDonald, Norman J.
collection PubMed
description Determining the taxonomic lineage of DNA sequences is an important step in metagenomic analysis. Short DNA fragments from next-generation sequencing projects and microbes that lack close relatives in reference sequenced genome databases pose significant problems to taxonomic attribution methods. Our new classification algorithm, RITA (Rapid Identification of Taxonomic Assignments), uses the agreement between composition and homology to accurately classify sequences as short as 50 nt in length by assigning them to different classification groups with varying degrees of confidence. RITA is much faster than the hybrid PhymmBL approach when comparable homology search algorithms are used, and achieves slightly better accuracy than PhymmBL on an artificial metagenome. RITA can also incorporate prior knowledge about taxonomic distributions to increase the accuracy of assignments in data sets with varying degrees of taxonomic novelty, and classified sequences with higher precision than the current best rank-flexible classifier. The accuracy on short reads can be increased by exploiting paired-end information, if available, which we demonstrate on a recently published bovine rumen data set. Finally, we develop a variant of RITA that incorporates accelerated homology search techniques, and generate predictions on a set of human gut metagenomes that were previously assigned to different ‘enterotypes’. RITA is freely available in Web server and standalone versions.
format Online
Article
Text
id pubmed-3413139
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-34131392012-08-07 Rapid identification of high-confidence taxonomic assignments for metagenomic data MacDonald, Norman J. Parks, Donovan H. Beiko, Robert G. Nucleic Acids Res Methods Online Determining the taxonomic lineage of DNA sequences is an important step in metagenomic analysis. Short DNA fragments from next-generation sequencing projects and microbes that lack close relatives in reference sequenced genome databases pose significant problems to taxonomic attribution methods. Our new classification algorithm, RITA (Rapid Identification of Taxonomic Assignments), uses the agreement between composition and homology to accurately classify sequences as short as 50 nt in length by assigning them to different classification groups with varying degrees of confidence. RITA is much faster than the hybrid PhymmBL approach when comparable homology search algorithms are used, and achieves slightly better accuracy than PhymmBL on an artificial metagenome. RITA can also incorporate prior knowledge about taxonomic distributions to increase the accuracy of assignments in data sets with varying degrees of taxonomic novelty, and classified sequences with higher precision than the current best rank-flexible classifier. The accuracy on short reads can be increased by exploiting paired-end information, if available, which we demonstrate on a recently published bovine rumen data set. Finally, we develop a variant of RITA that incorporates accelerated homology search techniques, and generate predictions on a set of human gut metagenomes that were previously assigned to different ‘enterotypes’. RITA is freely available in Web server and standalone versions. Oxford University Press 2012-08 2012-04-24 /pmc/articles/PMC3413139/ /pubmed/22532608 http://dx.doi.org/10.1093/nar/gks335 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
MacDonald, Norman J.
Parks, Donovan H.
Beiko, Robert G.
Rapid identification of high-confidence taxonomic assignments for metagenomic data
title Rapid identification of high-confidence taxonomic assignments for metagenomic data
title_full Rapid identification of high-confidence taxonomic assignments for metagenomic data
title_fullStr Rapid identification of high-confidence taxonomic assignments for metagenomic data
title_full_unstemmed Rapid identification of high-confidence taxonomic assignments for metagenomic data
title_short Rapid identification of high-confidence taxonomic assignments for metagenomic data
title_sort rapid identification of high-confidence taxonomic assignments for metagenomic data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3413139/
https://www.ncbi.nlm.nih.gov/pubmed/22532608
http://dx.doi.org/10.1093/nar/gks335
work_keys_str_mv AT macdonaldnormanj rapididentificationofhighconfidencetaxonomicassignmentsformetagenomicdata
AT parksdonovanh rapididentificationofhighconfidencetaxonomicassignmentsformetagenomicdata
AT beikorobertg rapididentificationofhighconfidencetaxonomicassignmentsformetagenomicdata