Cargando…

Accurate genome relative abundance estimation for closely related species in a metagenomic sample

BACKGROUND: Metagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely th...

Descripción completa

Detalles Bibliográficos
Autores principales: Sohn, Michael B, An, Lingling, Pookhao, Naruekamol, Li, Qike
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4131027/
https://www.ncbi.nlm.nih.gov/pubmed/25027647
http://dx.doi.org/10.1186/1471-2105-15-242
_version_ 1782330393580535808
author Sohn, Michael B
An, Lingling
Pookhao, Naruekamol
Li, Qike
author_facet Sohn, Michael B
An, Lingling
Pookhao, Naruekamol
Li, Qike
author_sort Sohn, Michael B
collection PubMed
description BACKGROUND: Metagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree. RESULTS: We developed a new homology-based approach called Taxonomic Analysis by Elimination and Correction (TAEC), which utilizes the similarity in the genomic sequence in addition to the result of an alignment tool. The proposed method is comprehensively tested on various simulated benchmark datasets of diverse complexity of microbial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in quantification of genomes in a given microbial sample. We also applied TAEC on two real metagenomic datasets, oral cavity dataset and Crohn’s disease dataset. Our results, while agreeing with previous findings at higher ranks of the taxonomy tree, provide accurate estimation of taxonomic compositions at the species/strain level, narrowing down which species/strains need more attention in the study of oral cavity and the Crohn’s disease. CONCLUSIONS: By taking account of the similarity in the genomic sequence TAEC outperforms other available tools in estimating taxonomic composition at a very low rank, especially when closely related species/strains exist in a metagenomic sample. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-242) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4131027
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41310272014-08-15 Accurate genome relative abundance estimation for closely related species in a metagenomic sample Sohn, Michael B An, Lingling Pookhao, Naruekamol Li, Qike BMC Bioinformatics Methodology Article BACKGROUND: Metagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree. RESULTS: We developed a new homology-based approach called Taxonomic Analysis by Elimination and Correction (TAEC), which utilizes the similarity in the genomic sequence in addition to the result of an alignment tool. The proposed method is comprehensively tested on various simulated benchmark datasets of diverse complexity of microbial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in quantification of genomes in a given microbial sample. We also applied TAEC on two real metagenomic datasets, oral cavity dataset and Crohn’s disease dataset. Our results, while agreeing with previous findings at higher ranks of the taxonomy tree, provide accurate estimation of taxonomic compositions at the species/strain level, narrowing down which species/strains need more attention in the study of oral cavity and the Crohn’s disease. CONCLUSIONS: By taking account of the similarity in the genomic sequence TAEC outperforms other available tools in estimating taxonomic composition at a very low rank, especially when closely related species/strains exist in a metagenomic sample. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-242) contains supplementary material, which is available to authorized users. BioMed Central 2014-07-16 /pmc/articles/PMC4131027/ /pubmed/25027647 http://dx.doi.org/10.1186/1471-2105-15-242 Text en © Sohn et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Sohn, Michael B
An, Lingling
Pookhao, Naruekamol
Li, Qike
Accurate genome relative abundance estimation for closely related species in a metagenomic sample
title Accurate genome relative abundance estimation for closely related species in a metagenomic sample
title_full Accurate genome relative abundance estimation for closely related species in a metagenomic sample
title_fullStr Accurate genome relative abundance estimation for closely related species in a metagenomic sample
title_full_unstemmed Accurate genome relative abundance estimation for closely related species in a metagenomic sample
title_short Accurate genome relative abundance estimation for closely related species in a metagenomic sample
title_sort accurate genome relative abundance estimation for closely related species in a metagenomic sample
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4131027/
https://www.ncbi.nlm.nih.gov/pubmed/25027647
http://dx.doi.org/10.1186/1471-2105-15-242
work_keys_str_mv AT sohnmichaelb accurategenomerelativeabundanceestimationforcloselyrelatedspeciesinametagenomicsample
AT anlingling accurategenomerelativeabundanceestimationforcloselyrelatedspeciesinametagenomicsample
AT pookhaonaruekamol accurategenomerelativeabundanceestimationforcloselyrelatedspeciesinametagenomicsample
AT liqike accurategenomerelativeabundanceestimationforcloselyrelatedspeciesinametagenomicsample