Cargando…

Combining gene prediction methods to improve metagenomic gene annotation

BACKGROUND: Traditional gene annotation methods rely on characteristics that may not be available in short reads generated from next generation technology, resulting in suboptimal performance for metagenomic (environmental) samples. Therefore, in recent years, new programs have been developed that o...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yok, Non G, Rosen, Gail L
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3042383/ https://www.ncbi.nlm.nih.gov/pubmed/21232129 http://dx.doi.org/10.1186/1471-2105-12-20

_version_	1782198537661972480
author	Yok, Non G Rosen, Gail L
author_facet	Yok, Non G Rosen, Gail L
author_sort	Yok, Non G
collection	PubMed
description	BACKGROUND: Traditional gene annotation methods rely on characteristics that may not be available in short reads generated from next generation technology, resulting in suboptimal performance for metagenomic (environmental) samples. Therefore, in recent years, new programs have been developed that optimize performance on short reads. In this work, we benchmark three metagenomic gene prediction programs and combine their predictions to improve metagenomic read gene annotation. RESULTS: We not only analyze the programs' performance at different read-lengths like similar studies, but also separate different types of reads, including intra- and intergenic regions, for analysis. The main deficiencies are in the algorithms' ability to predict non-coding regions and gene edges, resulting in more false-positives and false-negatives than desired. In fact, the specificities of the algorithms are notably worse than the sensitivities. By combining the programs' predictions, we show significant improvement in specificity at minimal cost to sensitivity, resulting in 4% improvement in accuracy for 100 bp reads with ~1% improvement in accuracy for 200 bp reads and above. To correctly annotate the start and stop of the genes, we find that a consensus of all the predictors performs best for shorter read lengths while a unanimous agreement is better for longer read lengths, boosting annotation accuracy by 1-8%. We also demonstrate use of the classifier combinations on a real dataset. CONCLUSIONS: To optimize the performance for both prediction and annotation accuracies, we conclude that the consensus of all methods (or a majority vote) is the best for reads 400 bp and shorter, while using the intersection of GeneMark and Orphelia predictions is the best for reads 500 bp and longer. We demonstrate that most methods predict over 80% coding (including partially coding) reads on a real human gut sample sequenced by Illumina technology.
format	Text
id	pubmed-3042383
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30423832011-02-25 Combining gene prediction methods to improve metagenomic gene annotation Yok, Non G Rosen, Gail L BMC Bioinformatics Research Article BACKGROUND: Traditional gene annotation methods rely on characteristics that may not be available in short reads generated from next generation technology, resulting in suboptimal performance for metagenomic (environmental) samples. Therefore, in recent years, new programs have been developed that optimize performance on short reads. In this work, we benchmark three metagenomic gene prediction programs and combine their predictions to improve metagenomic read gene annotation. RESULTS: We not only analyze the programs' performance at different read-lengths like similar studies, but also separate different types of reads, including intra- and intergenic regions, for analysis. The main deficiencies are in the algorithms' ability to predict non-coding regions and gene edges, resulting in more false-positives and false-negatives than desired. In fact, the specificities of the algorithms are notably worse than the sensitivities. By combining the programs' predictions, we show significant improvement in specificity at minimal cost to sensitivity, resulting in 4% improvement in accuracy for 100 bp reads with ~1% improvement in accuracy for 200 bp reads and above. To correctly annotate the start and stop of the genes, we find that a consensus of all the predictors performs best for shorter read lengths while a unanimous agreement is better for longer read lengths, boosting annotation accuracy by 1-8%. We also demonstrate use of the classifier combinations on a real dataset. CONCLUSIONS: To optimize the performance for both prediction and annotation accuracies, we conclude that the consensus of all methods (or a majority vote) is the best for reads 400 bp and shorter, while using the intersection of GeneMark and Orphelia predictions is the best for reads 500 bp and longer. We demonstrate that most methods predict over 80% coding (including partially coding) reads on a real human gut sample sequenced by Illumina technology. BioMed Central 2011-01-13 /pmc/articles/PMC3042383/ /pubmed/21232129 http://dx.doi.org/10.1186/1471-2105-12-20 Text en Copyright ©2011 Yok and Rosen; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Yok, Non G Rosen, Gail L Combining gene prediction methods to improve metagenomic gene annotation
title	Combining gene prediction methods to improve metagenomic gene annotation
title_full	Combining gene prediction methods to improve metagenomic gene annotation
title_fullStr	Combining gene prediction methods to improve metagenomic gene annotation
title_full_unstemmed	Combining gene prediction methods to improve metagenomic gene annotation
title_short	Combining gene prediction methods to improve metagenomic gene annotation
title_sort	combining gene prediction methods to improve metagenomic gene annotation
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3042383/ https://www.ncbi.nlm.nih.gov/pubmed/21232129 http://dx.doi.org/10.1186/1471-2105-12-20
work_keys_str_mv	AT yoknong combininggenepredictionmethodstoimprovemetagenomicgeneannotation AT rosengaill combininggenepredictionmethodstoimprovemetagenomicgeneannotation

Combining gene prediction methods to improve metagenomic gene annotation

Ejemplares similares