Cargando…

Re-Annotation of Protein-Coding Genes in the Genome of Saccharomyces cerevisiae Based on Support Vector Machines

The annotation of the well-studied organism, Saccharomyces cerevisiae, has been improving over the past decade while there are unresolved debates over the amount of biologically significant open reading frames (ORFs) in yeast genome. We revisited the total count of protein-coding genes in S. cerevis...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Dan, Yin, Xin, Wang, Xianlong, Zhou, Peng, Guo, Feng-Biao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3707884/
https://www.ncbi.nlm.nih.gov/pubmed/23874379
http://dx.doi.org/10.1371/journal.pone.0064477
_version_ 1782276554770874368
author Lin, Dan
Yin, Xin
Wang, Xianlong
Zhou, Peng
Guo, Feng-Biao
author_facet Lin, Dan
Yin, Xin
Wang, Xianlong
Zhou, Peng
Guo, Feng-Biao
author_sort Lin, Dan
collection PubMed
description The annotation of the well-studied organism, Saccharomyces cerevisiae, has been improving over the past decade while there are unresolved debates over the amount of biologically significant open reading frames (ORFs) in yeast genome. We revisited the total count of protein-coding genes in S. cerevisiae S288c genome using a theoretical approach by combining the Support Vector Machine (SVM) method with six widely used measurements of sequence statistical features. The accuracy of our method is over 99.5% in 10-fold cross-validation. Based on the annotation data in Saccharomyces Genome Database (SGD), we studied the coding capacity of all 1744 ORFs which lack experimental results and suggested that the overall number of chromosomal ORFs encoding proteins in yeast should be 6091 by removing 488 spurious ORFs. The importance of the present work lies in at least two aspects. First, cross-validation and retrospective examination showed the fidelity of our method in recognizing ORFs that likely encode proteins. Second, we have provided a web service that can be accessed at http://cobi.uestc.edu.cn/services/yeast/, which enables the prediction of protein-coding ORFs of the genus Saccharomyces with a high accuracy.
format Online
Article
Text
id pubmed-3707884
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-37078842013-07-19 Re-Annotation of Protein-Coding Genes in the Genome of Saccharomyces cerevisiae Based on Support Vector Machines Lin, Dan Yin, Xin Wang, Xianlong Zhou, Peng Guo, Feng-Biao PLoS One Research Article The annotation of the well-studied organism, Saccharomyces cerevisiae, has been improving over the past decade while there are unresolved debates over the amount of biologically significant open reading frames (ORFs) in yeast genome. We revisited the total count of protein-coding genes in S. cerevisiae S288c genome using a theoretical approach by combining the Support Vector Machine (SVM) method with six widely used measurements of sequence statistical features. The accuracy of our method is over 99.5% in 10-fold cross-validation. Based on the annotation data in Saccharomyces Genome Database (SGD), we studied the coding capacity of all 1744 ORFs which lack experimental results and suggested that the overall number of chromosomal ORFs encoding proteins in yeast should be 6091 by removing 488 spurious ORFs. The importance of the present work lies in at least two aspects. First, cross-validation and retrospective examination showed the fidelity of our method in recognizing ORFs that likely encode proteins. Second, we have provided a web service that can be accessed at http://cobi.uestc.edu.cn/services/yeast/, which enables the prediction of protein-coding ORFs of the genus Saccharomyces with a high accuracy. Public Library of Science 2013-07-10 /pmc/articles/PMC3707884/ /pubmed/23874379 http://dx.doi.org/10.1371/journal.pone.0064477 Text en © 2013 Lin et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Lin, Dan
Yin, Xin
Wang, Xianlong
Zhou, Peng
Guo, Feng-Biao
Re-Annotation of Protein-Coding Genes in the Genome of Saccharomyces cerevisiae Based on Support Vector Machines
title Re-Annotation of Protein-Coding Genes in the Genome of Saccharomyces cerevisiae Based on Support Vector Machines
title_full Re-Annotation of Protein-Coding Genes in the Genome of Saccharomyces cerevisiae Based on Support Vector Machines
title_fullStr Re-Annotation of Protein-Coding Genes in the Genome of Saccharomyces cerevisiae Based on Support Vector Machines
title_full_unstemmed Re-Annotation of Protein-Coding Genes in the Genome of Saccharomyces cerevisiae Based on Support Vector Machines
title_short Re-Annotation of Protein-Coding Genes in the Genome of Saccharomyces cerevisiae Based on Support Vector Machines
title_sort re-annotation of protein-coding genes in the genome of saccharomyces cerevisiae based on support vector machines
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3707884/
https://www.ncbi.nlm.nih.gov/pubmed/23874379
http://dx.doi.org/10.1371/journal.pone.0064477
work_keys_str_mv AT lindan reannotationofproteincodinggenesinthegenomeofsaccharomycescerevisiaebasedonsupportvectormachines
AT yinxin reannotationofproteincodinggenesinthegenomeofsaccharomycescerevisiaebasedonsupportvectormachines
AT wangxianlong reannotationofproteincodinggenesinthegenomeofsaccharomycescerevisiaebasedonsupportvectormachines
AT zhoupeng reannotationofproteincodinggenesinthegenomeofsaccharomycescerevisiaebasedonsupportvectormachines
AT guofengbiao reannotationofproteincodinggenesinthegenomeofsaccharomycescerevisiaebasedonsupportvectormachines