Cargando…

FragGeneScan: predicting genes in short and error-prone reads

The advances of next-generation sequencing technology have facilitated metagenomics research that attempts to determine directly the whole collection of genetic material within an environmental sample (i.e. the metagenome). Identification of genes directly from short reads has become an important ye...

Descripción completa

Detalles Bibliográficos
Autores principales: Rho, Mina, Tang, Haixu, Ye, Yuzhen
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2978382/
https://www.ncbi.nlm.nih.gov/pubmed/20805240
http://dx.doi.org/10.1093/nar/gkq747
_version_ 1782191254978691072
author Rho, Mina
Tang, Haixu
Ye, Yuzhen
author_facet Rho, Mina
Tang, Haixu
Ye, Yuzhen
author_sort Rho, Mina
collection PubMed
description The advances of next-generation sequencing technology have facilitated metagenomics research that attempts to determine directly the whole collection of genetic material within an environmental sample (i.e. the metagenome). Identification of genes directly from short reads has become an important yet challenging problem in annotating metagenomes, since the assembly of metagenomes is often not available. Gene predictors developed for whole genomes (e.g. Glimmer) and recently developed for metagenomic sequences (e.g. MetaGene) show a significant decrease in performance as the sequencing error rates increase, or as reads get shorter. We have developed a novel gene prediction method FragGeneScan, which combines sequencing error models and codon usages in a hidden Markov model to improve the prediction of protein-coding region in short reads. The performance of FragGeneScan was comparable to Glimmer and MetaGene for complete genomes. But for short reads, FragGeneScan consistently outperformed MetaGene (accuracy improved ∼62% for reads of 400 bases with 1% sequencing errors, and ∼18% for short reads of 100 bases that are error free). When applied to metagenomes, FragGeneScan recovered substantially more genes than MetaGene predicted (>90% of the genes identified by homology search), and many novel genes with no homologs in current protein sequence database.
format Text
id pubmed-2978382
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-29783822010-11-12 FragGeneScan: predicting genes in short and error-prone reads Rho, Mina Tang, Haixu Ye, Yuzhen Nucleic Acids Res Methods Online The advances of next-generation sequencing technology have facilitated metagenomics research that attempts to determine directly the whole collection of genetic material within an environmental sample (i.e. the metagenome). Identification of genes directly from short reads has become an important yet challenging problem in annotating metagenomes, since the assembly of metagenomes is often not available. Gene predictors developed for whole genomes (e.g. Glimmer) and recently developed for metagenomic sequences (e.g. MetaGene) show a significant decrease in performance as the sequencing error rates increase, or as reads get shorter. We have developed a novel gene prediction method FragGeneScan, which combines sequencing error models and codon usages in a hidden Markov model to improve the prediction of protein-coding region in short reads. The performance of FragGeneScan was comparable to Glimmer and MetaGene for complete genomes. But for short reads, FragGeneScan consistently outperformed MetaGene (accuracy improved ∼62% for reads of 400 bases with 1% sequencing errors, and ∼18% for short reads of 100 bases that are error free). When applied to metagenomes, FragGeneScan recovered substantially more genes than MetaGene predicted (>90% of the genes identified by homology search), and many novel genes with no homologs in current protein sequence database. Oxford University Press 2010-11 2010-08-30 /pmc/articles/PMC2978382/ /pubmed/20805240 http://dx.doi.org/10.1093/nar/gkq747 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Rho, Mina
Tang, Haixu
Ye, Yuzhen
FragGeneScan: predicting genes in short and error-prone reads
title FragGeneScan: predicting genes in short and error-prone reads
title_full FragGeneScan: predicting genes in short and error-prone reads
title_fullStr FragGeneScan: predicting genes in short and error-prone reads
title_full_unstemmed FragGeneScan: predicting genes in short and error-prone reads
title_short FragGeneScan: predicting genes in short and error-prone reads
title_sort fraggenescan: predicting genes in short and error-prone reads
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2978382/
https://www.ncbi.nlm.nih.gov/pubmed/20805240
http://dx.doi.org/10.1093/nar/gkq747
work_keys_str_mv AT rhomina fraggenescanpredictinggenesinshortanderrorpronereads
AT tanghaixu fraggenescanpredictinggenesinshortanderrorpronereads
AT yeyuzhen fraggenescanpredictinggenesinshortanderrorpronereads