Cargando…
Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities
BACKGROUND: The field of metagenomics (study of genetic material recovered directly from an environment) has grown rapidly, with many bioinformatics analysis methods being developed. To ensure appropriate use of such methods, robust comparative evaluation of their accuracy and features is needed. Fo...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4634789/ https://www.ncbi.nlm.nih.gov/pubmed/26537885 http://dx.doi.org/10.1186/s12859-015-0788-5 |
_version_ | 1782399418610221056 |
---|---|
author | Peabody, Michael A. Van Rossum, Thea Lo, Raymond Brinkman, Fiona S. L. |
author_facet | Peabody, Michael A. Van Rossum, Thea Lo, Raymond Brinkman, Fiona S. L. |
author_sort | Peabody, Michael A. |
collection | PubMed |
description | BACKGROUND: The field of metagenomics (study of genetic material recovered directly from an environment) has grown rapidly, with many bioinformatics analysis methods being developed. To ensure appropriate use of such methods, robust comparative evaluation of their accuracy and features is needed. For taxonomic classification of sequence reads, such evaluation should include use of clade exclusion, which better evaluates a method’s accuracy when identical sequences are not present in any reference database, as is common in metagenomic analysis. To date, relatively small evaluations have been performed, with evaluation approaches like clade exclusion limited to assessment of new methods by the authors of the given method. What is needed is a rigorous, independent comparison between multiple major methods, using the same in silico and in vitro test datasets, with and without approaches like clade exclusion, to better characterize accuracy under different conditions. RESULTS: An overview of the features of 38 bioinformatics methods is provided, evaluating accuracy with a focus on 11 programs that have reference databases that can be modified and therefore most robustly evaluated with clade exclusion. Taxonomic classification of sequence reads was evaluated using both in silico and in vitro mock bacterial communities. Clade exclusion was used at taxonomic levels from species to class—identifying how well methods perform in progressively more difficult scenarios. A wide range of variability was found in the sensitivity, precision, overall accuracy, and computational demand for the programs evaluated. In experiments where distilled water was spiked with only 11 bacterial species, frequently dozens to hundreds of species were falsely predicted by the most popular programs. The different features of each method (forces predictions or not, etc.) are summarized, and additional analysis considerations discussed. CONCLUSIONS: The accuracy of shotgun metagenomics classification methods varies widely. No one program clearly outperformed others in all evaluation scenarios; rather, the results illustrate the strengths of different methods for different purposes. Researchers must appreciate method differences, choosing the program best suited for their particular analysis to avoid very misleading results. Use of standardized datasets for method comparisons is encouraged, as is use of mock microbial community controls suitable for a particular metagenomic analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0788-5) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4634789 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-46347892015-11-06 Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities Peabody, Michael A. Van Rossum, Thea Lo, Raymond Brinkman, Fiona S. L. BMC Bioinformatics Research BACKGROUND: The field of metagenomics (study of genetic material recovered directly from an environment) has grown rapidly, with many bioinformatics analysis methods being developed. To ensure appropriate use of such methods, robust comparative evaluation of their accuracy and features is needed. For taxonomic classification of sequence reads, such evaluation should include use of clade exclusion, which better evaluates a method’s accuracy when identical sequences are not present in any reference database, as is common in metagenomic analysis. To date, relatively small evaluations have been performed, with evaluation approaches like clade exclusion limited to assessment of new methods by the authors of the given method. What is needed is a rigorous, independent comparison between multiple major methods, using the same in silico and in vitro test datasets, with and without approaches like clade exclusion, to better characterize accuracy under different conditions. RESULTS: An overview of the features of 38 bioinformatics methods is provided, evaluating accuracy with a focus on 11 programs that have reference databases that can be modified and therefore most robustly evaluated with clade exclusion. Taxonomic classification of sequence reads was evaluated using both in silico and in vitro mock bacterial communities. Clade exclusion was used at taxonomic levels from species to class—identifying how well methods perform in progressively more difficult scenarios. A wide range of variability was found in the sensitivity, precision, overall accuracy, and computational demand for the programs evaluated. In experiments where distilled water was spiked with only 11 bacterial species, frequently dozens to hundreds of species were falsely predicted by the most popular programs. The different features of each method (forces predictions or not, etc.) are summarized, and additional analysis considerations discussed. CONCLUSIONS: The accuracy of shotgun metagenomics classification methods varies widely. No one program clearly outperformed others in all evaluation scenarios; rather, the results illustrate the strengths of different methods for different purposes. Researchers must appreciate method differences, choosing the program best suited for their particular analysis to avoid very misleading results. Use of standardized datasets for method comparisons is encouraged, as is use of mock microbial community controls suitable for a particular metagenomic analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0788-5) contains supplementary material, which is available to authorized users. BioMed Central 2015-11-04 /pmc/articles/PMC4634789/ /pubmed/26537885 http://dx.doi.org/10.1186/s12859-015-0788-5 Text en © Peabody et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Peabody, Michael A. Van Rossum, Thea Lo, Raymond Brinkman, Fiona S. L. Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities |
title | Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities |
title_full | Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities |
title_fullStr | Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities |
title_full_unstemmed | Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities |
title_short | Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities |
title_sort | evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4634789/ https://www.ncbi.nlm.nih.gov/pubmed/26537885 http://dx.doi.org/10.1186/s12859-015-0788-5 |
work_keys_str_mv | AT peabodymichaela evaluationofshotgunmetagenomicssequenceclassificationmethodsusinginsilicoandinvitrosimulatedcommunities AT vanrossumthea evaluationofshotgunmetagenomicssequenceclassificationmethodsusinginsilicoandinvitrosimulatedcommunities AT loraymond evaluationofshotgunmetagenomicssequenceclassificationmethodsusinginsilicoandinvitrosimulatedcommunities AT brinkmanfionasl evaluationofshotgunmetagenomicssequenceclassificationmethodsusinginsilicoandinvitrosimulatedcommunities |