Cargando…

SCGPred: A Score-based Method for Gene Structure Prediction by Combining Multiple Sources of Evidence

Predicting protein-coding genes still remains a significant challenge. Although a variety of computational programs that use commonly machine learning methods have emerged, the accuracy of predictions remains a low level when implementing in large genomic sequences. Moreover, computational gene find...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Xiao, Ren, Qingan, Weng, Yang, Cai, Haoyang, Zhu, Yunmin, Zhang, Yizheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5054121/
https://www.ncbi.nlm.nih.gov/pubmed/19329068
http://dx.doi.org/10.1016/S1672-0229(09)60005-X
_version_ 1782458531382820864
author Li, Xiao
Ren, Qingan
Weng, Yang
Cai, Haoyang
Zhu, Yunmin
Zhang, Yizheng
author_facet Li, Xiao
Ren, Qingan
Weng, Yang
Cai, Haoyang
Zhu, Yunmin
Zhang, Yizheng
author_sort Li, Xiao
collection PubMed
description Predicting protein-coding genes still remains a significant challenge. Although a variety of computational programs that use commonly machine learning methods have emerged, the accuracy of predictions remains a low level when implementing in large genomic sequences. Moreover, computational gene finding in newly sequenced genomes is especially a difficult task due to the absence of a training set of abundant validated genes. Here we present a new gene-finding program, SCGPred, to improve the accuracy of prediction by combining multiple sources of evidence. SCGPred can perform both supervised method in previously well-studied genomes and unsupervised one in novel genomes. By testing with datasets composed of large DNA sequences from human and a novel genome of Ustilago maydi, SCGPred gains a significant improvement in comparison to the popular ab initio gene predictors. We also demonstrate that SCGPred can significantly improve prediction in novel genomes by combining several foreign gene finders with similarity alignments, which is superior to other unsupervised methods. Therefore, SCGPred can serve as an alternative gene-finding tool for newly sequenced eukaryotic genomes. The program is freely available at http://bio.scu.edu.cn/SCGPred/.
format Online
Article
Text
id pubmed-5054121
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-50541212016-10-14 SCGPred: A Score-based Method for Gene Structure Prediction by Combining Multiple Sources of Evidence Li, Xiao Ren, Qingan Weng, Yang Cai, Haoyang Zhu, Yunmin Zhang, Yizheng Genomics Proteomics Bioinformatics Method Predicting protein-coding genes still remains a significant challenge. Although a variety of computational programs that use commonly machine learning methods have emerged, the accuracy of predictions remains a low level when implementing in large genomic sequences. Moreover, computational gene finding in newly sequenced genomes is especially a difficult task due to the absence of a training set of abundant validated genes. Here we present a new gene-finding program, SCGPred, to improve the accuracy of prediction by combining multiple sources of evidence. SCGPred can perform both supervised method in previously well-studied genomes and unsupervised one in novel genomes. By testing with datasets composed of large DNA sequences from human and a novel genome of Ustilago maydi, SCGPred gains a significant improvement in comparison to the popular ab initio gene predictors. We also demonstrate that SCGPred can significantly improve prediction in novel genomes by combining several foreign gene finders with similarity alignments, which is superior to other unsupervised methods. Therefore, SCGPred can serve as an alternative gene-finding tool for newly sequenced eukaryotic genomes. The program is freely available at http://bio.scu.edu.cn/SCGPred/. Elsevier 2008 2009-03-27 /pmc/articles/PMC5054121/ /pubmed/19329068 http://dx.doi.org/10.1016/S1672-0229(09)60005-X Text en © 2008 Beijing Institute of Genomics http://creativecommons.org/licenses/by-nc-sa/3.0/ This is an open access article under the CC BY-NC-SA license (http://creativecommons.org/licenses/by-nc-sa/3.0/).
spellingShingle Method
Li, Xiao
Ren, Qingan
Weng, Yang
Cai, Haoyang
Zhu, Yunmin
Zhang, Yizheng
SCGPred: A Score-based Method for Gene Structure Prediction by Combining Multiple Sources of Evidence
title SCGPred: A Score-based Method for Gene Structure Prediction by Combining Multiple Sources of Evidence
title_full SCGPred: A Score-based Method for Gene Structure Prediction by Combining Multiple Sources of Evidence
title_fullStr SCGPred: A Score-based Method for Gene Structure Prediction by Combining Multiple Sources of Evidence
title_full_unstemmed SCGPred: A Score-based Method for Gene Structure Prediction by Combining Multiple Sources of Evidence
title_short SCGPred: A Score-based Method for Gene Structure Prediction by Combining Multiple Sources of Evidence
title_sort scgpred: a score-based method for gene structure prediction by combining multiple sources of evidence
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5054121/
https://www.ncbi.nlm.nih.gov/pubmed/19329068
http://dx.doi.org/10.1016/S1672-0229(09)60005-X
work_keys_str_mv AT lixiao scgpredascorebasedmethodforgenestructurepredictionbycombiningmultiplesourcesofevidence
AT renqingan scgpredascorebasedmethodforgenestructurepredictionbycombiningmultiplesourcesofevidence
AT wengyang scgpredascorebasedmethodforgenestructurepredictionbycombiningmultiplesourcesofevidence
AT caihaoyang scgpredascorebasedmethodforgenestructurepredictionbycombiningmultiplesourcesofevidence
AT zhuyunmin scgpredascorebasedmethodforgenestructurepredictionbycombiningmultiplesourcesofevidence
AT zhangyizheng scgpredascorebasedmethodforgenestructurepredictionbycombiningmultiplesourcesofevidence