Cargando…

RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles

BACKGROUND: Computational analysis of metagenomes requires the taxonomical assignment of the genome contigs assembled from DNA reads of environmental samples. Because of the diverse nature of microbiomes, the length of the assemblies obtained can vary between a few hundred bp to a few hundred Kbp. C...

Descripción completa

Detalles Bibliográficos
Autores principales: Nalbantoglu, Ozkan U, Way, Samuel F, Hinrichs, Steven H, Sayood, Khalid
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3038895/
https://www.ncbi.nlm.nih.gov/pubmed/21281493
http://dx.doi.org/10.1186/1471-2105-12-41
_version_ 1782198138503692288
author Nalbantoglu, Ozkan U
Way, Samuel F
Hinrichs, Steven H
Sayood, Khalid
author_facet Nalbantoglu, Ozkan U
Way, Samuel F
Hinrichs, Steven H
Sayood, Khalid
author_sort Nalbantoglu, Ozkan U
collection PubMed
description BACKGROUND: Computational analysis of metagenomes requires the taxonomical assignment of the genome contigs assembled from DNA reads of environmental samples. Because of the diverse nature of microbiomes, the length of the assemblies obtained can vary between a few hundred bp to a few hundred Kbp. Current taxonomic classification algorithms provide accurate classification for long contigs or for short fragments from organisms that have close relatives with annotated genomes. These are significant limitations for metagenome analysis because of the complexity of microbiomes and the paucity of existing annotated genomes. RESULTS: We propose a robust taxonomic classification method, RAIphy, that uses a novel sequence similarity metric with iterative refinement of taxonomic models and functions effectively without these limitations. We have tested RAIphy with synthetic metagenomics data ranging between 100 bp to 50 Kbp. Within a sequence read range of 100 bp-1000 bp, the sensitivity of RAIphy ranges between 38%-81% outperforming the currently popular composition-based methods for reads in this range. Comparison with computationally more intensive sequence similarity methods shows that RAIphy performs competitively while being significantly faster. The sensitivity-specificity characteristics for relatively longer contigs were compared with the PhyloPythia and TACOA algorithms. RAIphy performs better than these algorithms at varying clade-levels. For an acid mine drainage (AMD) metagenome, RAIphy was able to taxonomically bin the sequence read set more accurately than the currently available methods, Phymm and MEGAN, and more accurately in two out of three tests than the much more computationally intensive method, PhymmBL. CONCLUSIONS: With the introduction of the relative abundance index metric and an iterative classification method, we propose a taxonomic classification algorithm that performs competitively for a large range of DNA contig lengths assembled from metagenome data. Because of its speed, simplicity, and accuracy RAIphy can be successfully used in the binning process for a broad range of metagenomic data obtained from environmental samples.
format Text
id pubmed-3038895
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30388952011-02-28 RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles Nalbantoglu, Ozkan U Way, Samuel F Hinrichs, Steven H Sayood, Khalid BMC Bioinformatics Research Article BACKGROUND: Computational analysis of metagenomes requires the taxonomical assignment of the genome contigs assembled from DNA reads of environmental samples. Because of the diverse nature of microbiomes, the length of the assemblies obtained can vary between a few hundred bp to a few hundred Kbp. Current taxonomic classification algorithms provide accurate classification for long contigs or for short fragments from organisms that have close relatives with annotated genomes. These are significant limitations for metagenome analysis because of the complexity of microbiomes and the paucity of existing annotated genomes. RESULTS: We propose a robust taxonomic classification method, RAIphy, that uses a novel sequence similarity metric with iterative refinement of taxonomic models and functions effectively without these limitations. We have tested RAIphy with synthetic metagenomics data ranging between 100 bp to 50 Kbp. Within a sequence read range of 100 bp-1000 bp, the sensitivity of RAIphy ranges between 38%-81% outperforming the currently popular composition-based methods for reads in this range. Comparison with computationally more intensive sequence similarity methods shows that RAIphy performs competitively while being significantly faster. The sensitivity-specificity characteristics for relatively longer contigs were compared with the PhyloPythia and TACOA algorithms. RAIphy performs better than these algorithms at varying clade-levels. For an acid mine drainage (AMD) metagenome, RAIphy was able to taxonomically bin the sequence read set more accurately than the currently available methods, Phymm and MEGAN, and more accurately in two out of three tests than the much more computationally intensive method, PhymmBL. CONCLUSIONS: With the introduction of the relative abundance index metric and an iterative classification method, we propose a taxonomic classification algorithm that performs competitively for a large range of DNA contig lengths assembled from metagenome data. Because of its speed, simplicity, and accuracy RAIphy can be successfully used in the binning process for a broad range of metagenomic data obtained from environmental samples. BioMed Central 2011-01-31 /pmc/articles/PMC3038895/ /pubmed/21281493 http://dx.doi.org/10.1186/1471-2105-12-41 Text en Copyright ©2011 Nalbantoglu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Nalbantoglu, Ozkan U
Way, Samuel F
Hinrichs, Steven H
Sayood, Khalid
RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles
title RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles
title_full RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles
title_fullStr RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles
title_full_unstemmed RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles
title_short RAIphy: Phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles
title_sort raiphy: phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3038895/
https://www.ncbi.nlm.nih.gov/pubmed/21281493
http://dx.doi.org/10.1186/1471-2105-12-41
work_keys_str_mv AT nalbantogluozkanu raiphyphylogeneticclassificationofmetagenomicssamplesusingiterativerefinementofrelativeabundanceindexprofiles
AT waysamuelf raiphyphylogeneticclassificationofmetagenomicssamplesusingiterativerefinementofrelativeabundanceindexprofiles
AT hinrichsstevenh raiphyphylogeneticclassificationofmetagenomicssamplesusingiterativerefinementofrelativeabundanceindexprofiles
AT sayoodkhalid raiphyphylogeneticclassificationofmetagenomicssamplesusingiterativerefinementofrelativeabundanceindexprofiles