Cargando…

A comparative evaluation of sequence classification programs

BACKGROUND: A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics). Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented i...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bazinet, Adam L, Cummings, Michael P
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3428669/ https://www.ncbi.nlm.nih.gov/pubmed/22574964 http://dx.doi.org/10.1186/1471-2105-13-92

_version_	1782241724646555648
author	Bazinet, Adam L Cummings, Michael P
author_facet	Bazinet, Adam L Cummings, Michael P
author_sort	Bazinet, Adam L
collection	PubMed
description	BACKGROUND: A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics). Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classification, some methods screen sequence reads for ’barcoding genes’ like 16S rRNA, or various types of protein-coding genes. Due to the sheer number and complexity of methods, it can be difficult for a researcher to choose one that is well-suited for a particular analysis. RESULTS: We divided the very large number of programs that have been released in recent years for solving the sequence classification problem into three main categories based on the general algorithm they use to compare a query sequence against a database of sequences. We also evaluated the performance of the leading programs in each category on data sets whose taxonomic and functional composition is known. CONCLUSIONS: We found significant variability in classification accuracy, precision, and resource consumption of sequence classification programs when used to analyze various metagenomics data sets. However, we observe some general trends and patterns that will be useful to researchers who use sequence classification programs.
format	Online Article Text
id	pubmed-3428669
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-34286692012-08-30 A comparative evaluation of sequence classification programs Bazinet, Adam L Cummings, Michael P BMC Bioinformatics Research Article BACKGROUND: A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics). Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classification, some methods screen sequence reads for ’barcoding genes’ like 16S rRNA, or various types of protein-coding genes. Due to the sheer number and complexity of methods, it can be difficult for a researcher to choose one that is well-suited for a particular analysis. RESULTS: We divided the very large number of programs that have been released in recent years for solving the sequence classification problem into three main categories based on the general algorithm they use to compare a query sequence against a database of sequences. We also evaluated the performance of the leading programs in each category on data sets whose taxonomic and functional composition is known. CONCLUSIONS: We found significant variability in classification accuracy, precision, and resource consumption of sequence classification programs when used to analyze various metagenomics data sets. However, we observe some general trends and patterns that will be useful to researchers who use sequence classification programs. BioMed Central 2012-05-10 /pmc/articles/PMC3428669/ /pubmed/22574964 http://dx.doi.org/10.1186/1471-2105-13-92 Text en Copyright ©2012 Bazinet and Cummings; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Bazinet, Adam L Cummings, Michael P A comparative evaluation of sequence classification programs
title	A comparative evaluation of sequence classification programs
title_full	A comparative evaluation of sequence classification programs
title_fullStr	A comparative evaluation of sequence classification programs
title_full_unstemmed	A comparative evaluation of sequence classification programs
title_short	A comparative evaluation of sequence classification programs
title_sort	comparative evaluation of sequence classification programs
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3428669/ https://www.ncbi.nlm.nih.gov/pubmed/22574964 http://dx.doi.org/10.1186/1471-2105-13-92
work_keys_str_mv	AT bazinetadaml acomparativeevaluationofsequenceclassificationprograms AT cummingsmichaelp acomparativeevaluationofsequenceclassificationprograms AT bazinetadaml comparativeevaluationofsequenceclassificationprograms AT cummingsmichaelp comparativeevaluationofsequenceclassificationprograms

A comparative evaluation of sequence classification programs

Ejemplares similares