Cargando…

Consistency of metagenomic assignment programs in simulated and real data

BACKGROUND: Metagenomics is the genomic study of uncultured environmental samples, which has been greatly facilitated by the advent of shotgun-sequencing technologies. One of the main focuses of metagenomics is the discovery of previously uncultured microorganisms, which makes the assignment of sequ...

Descripción completa

Detalles Bibliográficos
Autores principales: Garcia-Etxebarria, Koldo, Garcia-Garcerà, Marc, Calafell, Francesc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3986635/
https://www.ncbi.nlm.nih.gov/pubmed/24678591
http://dx.doi.org/10.1186/1471-2105-15-90
_version_ 1782311744898596864
author Garcia-Etxebarria, Koldo
Garcia-Garcerà, Marc
Calafell, Francesc
author_facet Garcia-Etxebarria, Koldo
Garcia-Garcerà, Marc
Calafell, Francesc
author_sort Garcia-Etxebarria, Koldo
collection PubMed
description BACKGROUND: Metagenomics is the genomic study of uncultured environmental samples, which has been greatly facilitated by the advent of shotgun-sequencing technologies. One of the main focuses of metagenomics is the discovery of previously uncultured microorganisms, which makes the assignment of sequences to a particular taxon a challenge and a crucial step. Recently, several methods have been developed to perform this task, based on different methodologies such as sequence composition or sequence similarity. The sequence composition methods have the ability to completely assign the whole dataset. However, their use in metagenomics and the study of their performance with real data is limited. In this work, we assess the consistency of three different methods (BLAST + Lowest Common Ancestor, Phymm, and Naïve Bayesian Classifier) in assigning real and simulated sequence reads. RESULTS: Both in real and in simulated data, BLAST + Lowest Common Ancestor (BLAST + LCA), Phymm, and Naïve Bayesian Classifier consistently assign a larger number of reads in higher taxonomic levels than in lower levels. However, discrepancies increase at lower taxonomic levels. In simulated data, consistent assignments between all three methods showed greater precision than assignments based on Phymm or Bayesian Classifier alone, since the BLAST + LCA algorithm performed best. In addition, assignment consistency in real data increased with sequence read length, in agreement with previously published simulation results. CONCLUSIONS: The use and combination of different approaches is advisable to assign metagenomic reads. Although the sensitivity could be reduced, the reliability can be increased by using the reads consistently assigned to the same taxa by, at least, two methods, and by training the programs using all available information.
format Online
Article
Text
id pubmed-3986635
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39866352014-04-16 Consistency of metagenomic assignment programs in simulated and real data Garcia-Etxebarria, Koldo Garcia-Garcerà, Marc Calafell, Francesc BMC Bioinformatics Research Article BACKGROUND: Metagenomics is the genomic study of uncultured environmental samples, which has been greatly facilitated by the advent of shotgun-sequencing technologies. One of the main focuses of metagenomics is the discovery of previously uncultured microorganisms, which makes the assignment of sequences to a particular taxon a challenge and a crucial step. Recently, several methods have been developed to perform this task, based on different methodologies such as sequence composition or sequence similarity. The sequence composition methods have the ability to completely assign the whole dataset. However, their use in metagenomics and the study of their performance with real data is limited. In this work, we assess the consistency of three different methods (BLAST + Lowest Common Ancestor, Phymm, and Naïve Bayesian Classifier) in assigning real and simulated sequence reads. RESULTS: Both in real and in simulated data, BLAST + Lowest Common Ancestor (BLAST + LCA), Phymm, and Naïve Bayesian Classifier consistently assign a larger number of reads in higher taxonomic levels than in lower levels. However, discrepancies increase at lower taxonomic levels. In simulated data, consistent assignments between all three methods showed greater precision than assignments based on Phymm or Bayesian Classifier alone, since the BLAST + LCA algorithm performed best. In addition, assignment consistency in real data increased with sequence read length, in agreement with previously published simulation results. CONCLUSIONS: The use and combination of different approaches is advisable to assign metagenomic reads. Although the sensitivity could be reduced, the reliability can be increased by using the reads consistently assigned to the same taxa by, at least, two methods, and by training the programs using all available information. BioMed Central 2014-03-28 /pmc/articles/PMC3986635/ /pubmed/24678591 http://dx.doi.org/10.1186/1471-2105-15-90 Text en Copyright © 2014 Garcia-Etxebarria et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Research Article
Garcia-Etxebarria, Koldo
Garcia-Garcerà, Marc
Calafell, Francesc
Consistency of metagenomic assignment programs in simulated and real data
title Consistency of metagenomic assignment programs in simulated and real data
title_full Consistency of metagenomic assignment programs in simulated and real data
title_fullStr Consistency of metagenomic assignment programs in simulated and real data
title_full_unstemmed Consistency of metagenomic assignment programs in simulated and real data
title_short Consistency of metagenomic assignment programs in simulated and real data
title_sort consistency of metagenomic assignment programs in simulated and real data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3986635/
https://www.ncbi.nlm.nih.gov/pubmed/24678591
http://dx.doi.org/10.1186/1471-2105-15-90
work_keys_str_mv AT garciaetxebarriakoldo consistencyofmetagenomicassignmentprogramsinsimulatedandrealdata
AT garciagarceramarc consistencyofmetagenomicassignmentprogramsinsimulatedandrealdata
AT calafellfrancesc consistencyofmetagenomicassignmentprogramsinsimulatedandrealdata