Cargando…

Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities

BACKGROUND: The field of metagenomics (study of genetic material recovered directly from an environment) has grown rapidly, with many bioinformatics analysis methods being developed. To ensure appropriate use of such methods, robust comparative evaluation of their accuracy and features is needed. Fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Peabody, Michael A., Van Rossum, Thea, Lo, Raymond, Brinkman, Fiona S. L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4634789/
https://www.ncbi.nlm.nih.gov/pubmed/26537885
http://dx.doi.org/10.1186/s12859-015-0788-5
_version_ 1782399418610221056
author Peabody, Michael A.
Van Rossum, Thea
Lo, Raymond
Brinkman, Fiona S. L.
author_facet Peabody, Michael A.
Van Rossum, Thea
Lo, Raymond
Brinkman, Fiona S. L.
author_sort Peabody, Michael A.
collection PubMed
description BACKGROUND: The field of metagenomics (study of genetic material recovered directly from an environment) has grown rapidly, with many bioinformatics analysis methods being developed. To ensure appropriate use of such methods, robust comparative evaluation of their accuracy and features is needed. For taxonomic classification of sequence reads, such evaluation should include use of clade exclusion, which better evaluates a method’s accuracy when identical sequences are not present in any reference database, as is common in metagenomic analysis. To date, relatively small evaluations have been performed, with evaluation approaches like clade exclusion limited to assessment of new methods by the authors of the given method. What is needed is a rigorous, independent comparison between multiple major methods, using the same in silico and in vitro test datasets, with and without approaches like clade exclusion, to better characterize accuracy under different conditions. RESULTS: An overview of the features of 38 bioinformatics methods is provided, evaluating accuracy with a focus on 11 programs that have reference databases that can be modified and therefore most robustly evaluated with clade exclusion. Taxonomic classification of sequence reads was evaluated using both in silico and in vitro mock bacterial communities. Clade exclusion was used at taxonomic levels from species to class—identifying how well methods perform in progressively more difficult scenarios. A wide range of variability was found in the sensitivity, precision, overall accuracy, and computational demand for the programs evaluated. In experiments where distilled water was spiked with only 11 bacterial species, frequently dozens to hundreds of species were falsely predicted by the most popular programs. The different features of each method (forces predictions or not, etc.) are summarized, and additional analysis considerations discussed. CONCLUSIONS: The accuracy of shotgun metagenomics classification methods varies widely. No one program clearly outperformed others in all evaluation scenarios; rather, the results illustrate the strengths of different methods for different purposes. Researchers must appreciate method differences, choosing the program best suited for their particular analysis to avoid very misleading results. Use of standardized datasets for method comparisons is encouraged, as is use of mock microbial community controls suitable for a particular metagenomic analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0788-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4634789
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46347892015-11-06 Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities Peabody, Michael A. Van Rossum, Thea Lo, Raymond Brinkman, Fiona S. L. BMC Bioinformatics Research BACKGROUND: The field of metagenomics (study of genetic material recovered directly from an environment) has grown rapidly, with many bioinformatics analysis methods being developed. To ensure appropriate use of such methods, robust comparative evaluation of their accuracy and features is needed. For taxonomic classification of sequence reads, such evaluation should include use of clade exclusion, which better evaluates a method’s accuracy when identical sequences are not present in any reference database, as is common in metagenomic analysis. To date, relatively small evaluations have been performed, with evaluation approaches like clade exclusion limited to assessment of new methods by the authors of the given method. What is needed is a rigorous, independent comparison between multiple major methods, using the same in silico and in vitro test datasets, with and without approaches like clade exclusion, to better characterize accuracy under different conditions. RESULTS: An overview of the features of 38 bioinformatics methods is provided, evaluating accuracy with a focus on 11 programs that have reference databases that can be modified and therefore most robustly evaluated with clade exclusion. Taxonomic classification of sequence reads was evaluated using both in silico and in vitro mock bacterial communities. Clade exclusion was used at taxonomic levels from species to class—identifying how well methods perform in progressively more difficult scenarios. A wide range of variability was found in the sensitivity, precision, overall accuracy, and computational demand for the programs evaluated. In experiments where distilled water was spiked with only 11 bacterial species, frequently dozens to hundreds of species were falsely predicted by the most popular programs. The different features of each method (forces predictions or not, etc.) are summarized, and additional analysis considerations discussed. CONCLUSIONS: The accuracy of shotgun metagenomics classification methods varies widely. No one program clearly outperformed others in all evaluation scenarios; rather, the results illustrate the strengths of different methods for different purposes. Researchers must appreciate method differences, choosing the program best suited for their particular analysis to avoid very misleading results. Use of standardized datasets for method comparisons is encouraged, as is use of mock microbial community controls suitable for a particular metagenomic analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0788-5) contains supplementary material, which is available to authorized users. BioMed Central 2015-11-04 /pmc/articles/PMC4634789/ /pubmed/26537885 http://dx.doi.org/10.1186/s12859-015-0788-5 Text en © Peabody et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Peabody, Michael A.
Van Rossum, Thea
Lo, Raymond
Brinkman, Fiona S. L.
Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities
title Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities
title_full Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities
title_fullStr Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities
title_full_unstemmed Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities
title_short Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities
title_sort evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4634789/
https://www.ncbi.nlm.nih.gov/pubmed/26537885
http://dx.doi.org/10.1186/s12859-015-0788-5
work_keys_str_mv AT peabodymichaela evaluationofshotgunmetagenomicssequenceclassificationmethodsusinginsilicoandinvitrosimulatedcommunities
AT vanrossumthea evaluationofshotgunmetagenomicssequenceclassificationmethodsusinginsilicoandinvitrosimulatedcommunities
AT loraymond evaluationofshotgunmetagenomicssequenceclassificationmethodsusinginsilicoandinvitrosimulatedcommunities
AT brinkmanfionasl evaluationofshotgunmetagenomicssequenceclassificationmethodsusinginsilicoandinvitrosimulatedcommunities