Cargando…
MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks
Motivation: Metagenomics is a recent field of biology that studies microbial communities by analyzing their genomic content directly sequenced from the environment. A metagenomic dataset consists of many short DNA or RNA fragments called reads. One interesting problem in metagenomic data analysis is...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3018814/ https://www.ncbi.nlm.nih.gov/pubmed/21127032 http://dx.doi.org/10.1093/bioinformatics/btq649 |
_version_ | 1782196122075267072 |
---|---|
author | Gori, Fabio Folino, Gianluigi Jetten, Mike S. M. Marchiori, Elena |
author_facet | Gori, Fabio Folino, Gianluigi Jetten, Mike S. M. Marchiori, Elena |
author_sort | Gori, Fabio |
collection | PubMed |
description | Motivation: Metagenomics is a recent field of biology that studies microbial communities by analyzing their genomic content directly sequenced from the environment. A metagenomic dataset consists of many short DNA or RNA fragments called reads. One interesting problem in metagenomic data analysis is the discovery of the taxonomic composition of a given dataset. A simple method for this task, called the Lowest Common Ancestor (LCA), is employed in state-of-the-art computational tools for metagenomic data analysis of very short reads (about 100 bp). However LCA has two main drawbacks: it possibly assigns many reads to high taxonomic ranks and it discards a high number of reads. Results: We present MTR, a new method for tackling these drawbacks using clustering at Multiple Taxonomic Ranks. Unlike LCA, which processes the reads one-by-one, MTR exploits information shared by reads. Specifically, MTR consists of two main phases. First, for each taxonomic rank, a collection of potential clusters of reads is generated, and each potential cluster is associated to a taxon at that rank. Next, a small number of clusters is selected at each rank using a combinatorial optimization algorithm. The effectiveness of the resulting method is tested on a large number of simulated and real-life metagenomes. Results of experiments show that MTR improves on LCA by discarding a significantly smaller number of reads and by assigning much more reads at lower taxonomic ranks. Moreover, MTR provides a more faithful taxonomic characterization of the metagenome population distribution. Availability: Matlab and C++ source codes of the method available at http://cs.ru.nl/˜gori/software/MTR.tar.gz. Contact: gori@cs.ru.nl; elenam@cs.ru.nl Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Text |
id | pubmed-3018814 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-30188142011-01-12 MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks Gori, Fabio Folino, Gianluigi Jetten, Mike S. M. Marchiori, Elena Bioinformatics Original Papers Motivation: Metagenomics is a recent field of biology that studies microbial communities by analyzing their genomic content directly sequenced from the environment. A metagenomic dataset consists of many short DNA or RNA fragments called reads. One interesting problem in metagenomic data analysis is the discovery of the taxonomic composition of a given dataset. A simple method for this task, called the Lowest Common Ancestor (LCA), is employed in state-of-the-art computational tools for metagenomic data analysis of very short reads (about 100 bp). However LCA has two main drawbacks: it possibly assigns many reads to high taxonomic ranks and it discards a high number of reads. Results: We present MTR, a new method for tackling these drawbacks using clustering at Multiple Taxonomic Ranks. Unlike LCA, which processes the reads one-by-one, MTR exploits information shared by reads. Specifically, MTR consists of two main phases. First, for each taxonomic rank, a collection of potential clusters of reads is generated, and each potential cluster is associated to a taxon at that rank. Next, a small number of clusters is selected at each rank using a combinatorial optimization algorithm. The effectiveness of the resulting method is tested on a large number of simulated and real-life metagenomes. Results of experiments show that MTR improves on LCA by discarding a significantly smaller number of reads and by assigning much more reads at lower taxonomic ranks. Moreover, MTR provides a more faithful taxonomic characterization of the metagenome population distribution. Availability: Matlab and C++ source codes of the method available at http://cs.ru.nl/˜gori/software/MTR.tar.gz. Contact: gori@cs.ru.nl; elenam@cs.ru.nl Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2011-01-15 2010-12-01 /pmc/articles/PMC3018814/ /pubmed/21127032 http://dx.doi.org/10.1093/bioinformatics/btq649 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Gori, Fabio Folino, Gianluigi Jetten, Mike S. M. Marchiori, Elena MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks |
title | MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks |
title_full | MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks |
title_fullStr | MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks |
title_full_unstemmed | MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks |
title_short | MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks |
title_sort | mtr: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3018814/ https://www.ncbi.nlm.nih.gov/pubmed/21127032 http://dx.doi.org/10.1093/bioinformatics/btq649 |
work_keys_str_mv | AT gorifabio mtrtaxonomicannotationofshortmetagenomicreadsusingclusteringatmultipletaxonomicranks AT folinogianluigi mtrtaxonomicannotationofshortmetagenomicreadsusingclusteringatmultipletaxonomicranks AT jettenmikesm mtrtaxonomicannotationofshortmetagenomicreadsusingclusteringatmultipletaxonomicranks AT marchiorielena mtrtaxonomicannotationofshortmetagenomicreadsusingclusteringatmultipletaxonomicranks |