Cargando…

GeneValidator: identify problems with protein-coding gene predictions

Summary: Genomes of emerging model organisms are now being sequenced at very low cost. However, obtaining accurate gene predictions remains challenging: even the best gene prediction algorithms make substantial errors and can jeopardize subsequent analyses. Therefore, many predicted genes must be ti...

Descripción completa

Detalles Bibliográficos
Autores principales: Drăgan, Monica-Andreea, Moghul, Ismail, Priyam, Anurag, Bustos, Claudio, Wurm, Yannick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4866521/
https://www.ncbi.nlm.nih.gov/pubmed/26787666
http://dx.doi.org/10.1093/bioinformatics/btw015
_version_ 1782431926768893952
author Drăgan, Monica-Andreea
Moghul, Ismail
Priyam, Anurag
Bustos, Claudio
Wurm, Yannick
author_facet Drăgan, Monica-Andreea
Moghul, Ismail
Priyam, Anurag
Bustos, Claudio
Wurm, Yannick
author_sort Drăgan, Monica-Andreea
collection PubMed
description Summary: Genomes of emerging model organisms are now being sequenced at very low cost. However, obtaining accurate gene predictions remains challenging: even the best gene prediction algorithms make substantial errors and can jeopardize subsequent analyses. Therefore, many predicted genes must be time-consumingly visually inspected and manually curated. We developed GeneValidator (GV) to automatically identify problematic gene predictions and to aid manual curation. For each gene, GV performs multiple analyses based on comparisons to gene sequences from large databases. The resulting report identifies problematic gene predictions and includes extensive statistics and graphs for each prediction to guide manual curation efforts. GV thus accelerates and enhances the work of biocurators and researchers who need accurate gene predictions from newly sequenced genomes. Availability and implementation: GV can be used through a web interface or in the command-line. GV is open-source (AGPL), available at https://wurmlab.github.io/tools/genevalidator. Contact: y.wurm@qmul.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4866521
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-48665212016-05-16 GeneValidator: identify problems with protein-coding gene predictions Drăgan, Monica-Andreea Moghul, Ismail Priyam, Anurag Bustos, Claudio Wurm, Yannick Bioinformatics Applications Notes Summary: Genomes of emerging model organisms are now being sequenced at very low cost. However, obtaining accurate gene predictions remains challenging: even the best gene prediction algorithms make substantial errors and can jeopardize subsequent analyses. Therefore, many predicted genes must be time-consumingly visually inspected and manually curated. We developed GeneValidator (GV) to automatically identify problematic gene predictions and to aid manual curation. For each gene, GV performs multiple analyses based on comparisons to gene sequences from large databases. The resulting report identifies problematic gene predictions and includes extensive statistics and graphs for each prediction to guide manual curation efforts. GV thus accelerates and enhances the work of biocurators and researchers who need accurate gene predictions from newly sequenced genomes. Availability and implementation: GV can be used through a web interface or in the command-line. GV is open-source (AGPL), available at https://wurmlab.github.io/tools/genevalidator. Contact: y.wurm@qmul.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-05-15 2016-01-18 /pmc/articles/PMC4866521/ /pubmed/26787666 http://dx.doi.org/10.1093/bioinformatics/btw015 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Drăgan, Monica-Andreea
Moghul, Ismail
Priyam, Anurag
Bustos, Claudio
Wurm, Yannick
GeneValidator: identify problems with protein-coding gene predictions
title GeneValidator: identify problems with protein-coding gene predictions
title_full GeneValidator: identify problems with protein-coding gene predictions
title_fullStr GeneValidator: identify problems with protein-coding gene predictions
title_full_unstemmed GeneValidator: identify problems with protein-coding gene predictions
title_short GeneValidator: identify problems with protein-coding gene predictions
title_sort genevalidator: identify problems with protein-coding gene predictions
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4866521/
https://www.ncbi.nlm.nih.gov/pubmed/26787666
http://dx.doi.org/10.1093/bioinformatics/btw015
work_keys_str_mv AT draganmonicaandreea genevalidatoridentifyproblemswithproteincodinggenepredictions
AT moghulismail genevalidatoridentifyproblemswithproteincodinggenepredictions
AT priyamanurag genevalidatoridentifyproblemswithproteincodinggenepredictions
AT bustosclaudio genevalidatoridentifyproblemswithproteincodinggenepredictions
AT wurmyannick genevalidatoridentifyproblemswithproteincodinggenepredictions