Cargando…

Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction

Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete anno...

Descripción completa

Detalles Bibliográficos
Autores principales: Bernal, Axel, Crammer, Koby, Hatzigeorgiou, Artemis, Pereira, Fernando
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1828702/
https://www.ncbi.nlm.nih.gov/pubmed/17367206
http://dx.doi.org/10.1371/journal.pcbi.0030054
_version_ 1782132736119537664
author Bernal, Axel
Crammer, Koby
Hatzigeorgiou, Artemis
Pereira, Fernando
author_facet Bernal, Axel
Crammer, Koby
Hatzigeorgiou, Artemis
Pereira, Fernando
author_sort Bernal, Axel
collection PubMed
description Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM) in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns.
format Text
id pubmed-1828702
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-18287022007-03-20 Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction Bernal, Axel Crammer, Koby Hatzigeorgiou, Artemis Pereira, Fernando PLoS Comput Biol Research Article Most ab initio gene predictors use a probabilistic sequence model, typically a hidden Markov model, to combine separately trained models of genomic signals and content. By combining separate models of relevant genomic features, such gene predictors can exploit small training sets and incomplete annotations, and can be trained fairly efficiently. However, that type of piecewise training does not optimize prediction accuracy and has difficulty in accounting for statistical dependencies among different parts of the gene model. With genomic information being created at an ever-increasing rate, it is worth investigating alternative approaches in which many different types of genomic evidence, with complex statistical dependencies, can be integrated by discriminative learning to maximize annotation accuracy. Among discriminative learning methods, large-margin classifiers have become prominent because of the success of support vector machines (SVM) in many classification tasks. We describe CRAIG, a new program for ab initio gene prediction based on a conditional random field model with semi-Markov structure that is trained with an online large-margin algorithm related to multiclass SVMs. Our experiments on benchmark vertebrate datasets and on regions from the ENCODE project show significant improvements in prediction accuracy over published gene predictors that use intrinsic features only, particularly at the gene level and on genes with long introns. Public Library of Science 2007-03 2007-03-16 /pmc/articles/PMC1828702/ /pubmed/17367206 http://dx.doi.org/10.1371/journal.pcbi.0030054 Text en © 2007 Bernal et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Bernal, Axel
Crammer, Koby
Hatzigeorgiou, Artemis
Pereira, Fernando
Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction
title Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction
title_full Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction
title_fullStr Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction
title_full_unstemmed Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction
title_short Global Discriminative Learning for Higher-Accuracy Computational Gene Prediction
title_sort global discriminative learning for higher-accuracy computational gene prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1828702/
https://www.ncbi.nlm.nih.gov/pubmed/17367206
http://dx.doi.org/10.1371/journal.pcbi.0030054
work_keys_str_mv AT bernalaxel globaldiscriminativelearningforhigheraccuracycomputationalgeneprediction
AT crammerkoby globaldiscriminativelearningforhigheraccuracycomputationalgeneprediction
AT hatzigeorgiouartemis globaldiscriminativelearningforhigheraccuracycomputationalgeneprediction
AT pereirafernando globaldiscriminativelearningforhigheraccuracycomputationalgeneprediction