Cargando…

Evaluating techniques for metagenome annotation using simulated sequence data

The advent of next-generation sequencing has allowed huge amounts of DNA sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The current challenge is to identify from which microorganisms and genes the DNA originated. Several tools and databases are available for...

Descripción completa

Detalles Bibliográficos
Autores principales:	Randle-Boggis, Richard J., Helgason, Thorunn, Sapp, Melanie, Ashton, Peter D.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4892715/ https://www.ncbi.nlm.nih.gov/pubmed/27162180 http://dx.doi.org/10.1093/femsec/fiw095

_version_	1782435442237374464
author	Randle-Boggis, Richard J. Helgason, Thorunn Sapp, Melanie Ashton, Peter D.
author_facet	Randle-Boggis, Richard J. Helgason, Thorunn Sapp, Melanie Ashton, Peter D.
author_sort	Randle-Boggis, Richard J.
collection	PubMed
description	The advent of next-generation sequencing has allowed huge amounts of DNA sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The current challenge is to identify from which microorganisms and genes the DNA originated. Several tools and databases are available for annotating DNA sequences. The tools, databases and parameters used can have a significant impact on the results: naïve choice of these factors can result in a false representation of community composition and function. We use a simulated metagenome to show how different parameters affect annotation accuracy by evaluating the sequence annotation performances of MEGAN, MG-RAST, One Codex and Megablast. This simulated metagenome allowed the recovery of known organism and function abundances to be quantitatively evaluated, which is not possible for environmental metagenomes. The performance of each program and database varied, e.g. One Codex correctly annotated many sequences at the genus level, whereas MG-RAST RefSeq produced many false positive annotations. This effect decreased as the taxonomic level investigated increased. Selecting more stringent parameters decreases the annotation sensitivity, but increases precision. Ultimately, there is a trade-off between taxonomic resolution and annotation accuracy. These results should be considered when annotating metagenomes and interpreting results from previous studies.
format	Online Article Text
id	pubmed-4892715
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-48927152016-06-07 Evaluating techniques for metagenome annotation using simulated sequence data Randle-Boggis, Richard J. Helgason, Thorunn Sapp, Melanie Ashton, Peter D. FEMS Microbiol Ecol Research Article The advent of next-generation sequencing has allowed huge amounts of DNA sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The current challenge is to identify from which microorganisms and genes the DNA originated. Several tools and databases are available for annotating DNA sequences. The tools, databases and parameters used can have a significant impact on the results: naïve choice of these factors can result in a false representation of community composition and function. We use a simulated metagenome to show how different parameters affect annotation accuracy by evaluating the sequence annotation performances of MEGAN, MG-RAST, One Codex and Megablast. This simulated metagenome allowed the recovery of known organism and function abundances to be quantitatively evaluated, which is not possible for environmental metagenomes. The performance of each program and database varied, e.g. One Codex correctly annotated many sequences at the genus level, whereas MG-RAST RefSeq produced many false positive annotations. This effect decreased as the taxonomic level investigated increased. Selecting more stringent parameters decreases the annotation sensitivity, but increases precision. Ultimately, there is a trade-off between taxonomic resolution and annotation accuracy. These results should be considered when annotating metagenomes and interpreting results from previous studies. Oxford University Press 2016-05-08 2016-07-01 /pmc/articles/PMC4892715/ /pubmed/27162180 http://dx.doi.org/10.1093/femsec/fiw095 Text en © FEMS 2016. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Research Article Randle-Boggis, Richard J. Helgason, Thorunn Sapp, Melanie Ashton, Peter D. Evaluating techniques for metagenome annotation using simulated sequence data
title	Evaluating techniques for metagenome annotation using simulated sequence data
title_full	Evaluating techniques for metagenome annotation using simulated sequence data
title_fullStr	Evaluating techniques for metagenome annotation using simulated sequence data
title_full_unstemmed	Evaluating techniques for metagenome annotation using simulated sequence data
title_short	Evaluating techniques for metagenome annotation using simulated sequence data
title_sort	evaluating techniques for metagenome annotation using simulated sequence data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4892715/ https://www.ncbi.nlm.nih.gov/pubmed/27162180 http://dx.doi.org/10.1093/femsec/fiw095
work_keys_str_mv	AT randleboggisrichardj evaluatingtechniquesformetagenomeannotationusingsimulatedsequencedata AT helgasonthorunn evaluatingtechniquesformetagenomeannotationusingsimulatedsequencedata AT sappmelanie evaluatingtechniquesformetagenomeannotationusingsimulatedsequencedata AT ashtonpeterd evaluatingtechniquesformetagenomeannotationusingsimulatedsequencedata

Evaluating techniques for metagenome annotation using simulated sequence data

Ejemplares similares