Cargando…

A Statistical Framework for Accurate Taxonomic Assignment of Metagenomic Sequencing Reads

The advent of next-generation sequencing technologies has greatly promoted the field of metagenomics which studies genetic material recovered directly from an environment. Characterization of genomic composition of a metagenomic sample is essential for understanding the structure of the microbial co...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Hongmei, An, Lingling, Lin, Simon M., Feng, Gang, Qiu, Yuqing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3462201/
https://www.ncbi.nlm.nih.gov/pubmed/23049702
http://dx.doi.org/10.1371/journal.pone.0046450
_version_ 1782245157955960832
author Jiang, Hongmei
An, Lingling
Lin, Simon M.
Feng, Gang
Qiu, Yuqing
author_facet Jiang, Hongmei
An, Lingling
Lin, Simon M.
Feng, Gang
Qiu, Yuqing
author_sort Jiang, Hongmei
collection PubMed
description The advent of next-generation sequencing technologies has greatly promoted the field of metagenomics which studies genetic material recovered directly from an environment. Characterization of genomic composition of a metagenomic sample is essential for understanding the structure of the microbial community. Multiple genomes contained in a metagenomic sample can be identified and quantitated through homology searches of sequence reads with known sequences catalogued in reference databases. Traditionally, reads with multiple genomic hits are assigned to non-specific or high ranks of the taxonomy tree, thereby impacting on accurate estimates of relative abundance of multiple genomes present in a sample. Instead of assigning reads one by one to the taxonomy tree as many existing methods do, we propose a statistical framework to model the identified candidate genomes to which sequence reads have hits. After obtaining the estimated proportion of reads generated by each genome, sequence reads are assigned to the candidate genomes and the taxonomy tree based on the estimated probability by taking into account both sequence alignment scores and estimated genome abundance. The proposed method is comprehensively tested on both simulated datasets and two real datasets. It assigns reads to the low taxonomic ranks very accurately. Our statistical approach of taxonomic assignment of metagenomic reads, TAMER, is implemented in R and available at http://faculty.wcas.northwestern.edu/hji403/MetaR.htm.
format Online
Article
Text
id pubmed-3462201
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34622012012-10-05 A Statistical Framework for Accurate Taxonomic Assignment of Metagenomic Sequencing Reads Jiang, Hongmei An, Lingling Lin, Simon M. Feng, Gang Qiu, Yuqing PLoS One Research Article The advent of next-generation sequencing technologies has greatly promoted the field of metagenomics which studies genetic material recovered directly from an environment. Characterization of genomic composition of a metagenomic sample is essential for understanding the structure of the microbial community. Multiple genomes contained in a metagenomic sample can be identified and quantitated through homology searches of sequence reads with known sequences catalogued in reference databases. Traditionally, reads with multiple genomic hits are assigned to non-specific or high ranks of the taxonomy tree, thereby impacting on accurate estimates of relative abundance of multiple genomes present in a sample. Instead of assigning reads one by one to the taxonomy tree as many existing methods do, we propose a statistical framework to model the identified candidate genomes to which sequence reads have hits. After obtaining the estimated proportion of reads generated by each genome, sequence reads are assigned to the candidate genomes and the taxonomy tree based on the estimated probability by taking into account both sequence alignment scores and estimated genome abundance. The proposed method is comprehensively tested on both simulated datasets and two real datasets. It assigns reads to the low taxonomic ranks very accurately. Our statistical approach of taxonomic assignment of metagenomic reads, TAMER, is implemented in R and available at http://faculty.wcas.northwestern.edu/hji403/MetaR.htm. Public Library of Science 2012-10-01 /pmc/articles/PMC3462201/ /pubmed/23049702 http://dx.doi.org/10.1371/journal.pone.0046450 Text en © 2012 Jiang et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Jiang, Hongmei
An, Lingling
Lin, Simon M.
Feng, Gang
Qiu, Yuqing
A Statistical Framework for Accurate Taxonomic Assignment of Metagenomic Sequencing Reads
title A Statistical Framework for Accurate Taxonomic Assignment of Metagenomic Sequencing Reads
title_full A Statistical Framework for Accurate Taxonomic Assignment of Metagenomic Sequencing Reads
title_fullStr A Statistical Framework for Accurate Taxonomic Assignment of Metagenomic Sequencing Reads
title_full_unstemmed A Statistical Framework for Accurate Taxonomic Assignment of Metagenomic Sequencing Reads
title_short A Statistical Framework for Accurate Taxonomic Assignment of Metagenomic Sequencing Reads
title_sort statistical framework for accurate taxonomic assignment of metagenomic sequencing reads
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3462201/
https://www.ncbi.nlm.nih.gov/pubmed/23049702
http://dx.doi.org/10.1371/journal.pone.0046450
work_keys_str_mv AT jianghongmei astatisticalframeworkforaccuratetaxonomicassignmentofmetagenomicsequencingreads
AT anlingling astatisticalframeworkforaccuratetaxonomicassignmentofmetagenomicsequencingreads
AT linsimonm astatisticalframeworkforaccuratetaxonomicassignmentofmetagenomicsequencingreads
AT fenggang astatisticalframeworkforaccuratetaxonomicassignmentofmetagenomicsequencingreads
AT qiuyuqing astatisticalframeworkforaccuratetaxonomicassignmentofmetagenomicsequencingreads
AT jianghongmei statisticalframeworkforaccuratetaxonomicassignmentofmetagenomicsequencingreads
AT anlingling statisticalframeworkforaccuratetaxonomicassignmentofmetagenomicsequencingreads
AT linsimonm statisticalframeworkforaccuratetaxonomicassignmentofmetagenomicsequencingreads
AT fenggang statisticalframeworkforaccuratetaxonomicassignmentofmetagenomicsequencingreads
AT qiuyuqing statisticalframeworkforaccuratetaxonomicassignmentofmetagenomicsequencingreads