Cargando…
Applying Support Vector Machines for Gene ontology based gene function prediction
BACKGROUND: The current progress in sequencing projects calls for rapid, reliable and accurate function assignments of gene products. A variety of methods has been designed to annotate sequences on a large scale. However, these methods can either only be applied for specific subsets, or their result...
Autores principales: | , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2004
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC517617/ https://www.ncbi.nlm.nih.gov/pubmed/15333146 http://dx.doi.org/10.1186/1471-2105-5-116 |
_version_ | 1782121783194812416 |
---|---|
author | Vinayagam, Arunachalam König, Rainer Moormann, Jutta Schubert, Falk Eils, Roland Glatting, Karl-Heinz Suhai, Sándor |
author_facet | Vinayagam, Arunachalam König, Rainer Moormann, Jutta Schubert, Falk Eils, Roland Glatting, Karl-Heinz Suhai, Sándor |
author_sort | Vinayagam, Arunachalam |
collection | PubMed |
description | BACKGROUND: The current progress in sequencing projects calls for rapid, reliable and accurate function assignments of gene products. A variety of methods has been designed to annotate sequences on a large scale. However, these methods can either only be applied for specific subsets, or their results are not formalised, or they do not provide precise confidence estimates for their predictions. RESULTS: We have developed a large-scale annotation system that tackles all of these shortcomings. In our approach, annotation was provided through Gene Ontology terms by applying multiple Support Vector Machines (SVM) for the classification of correct and false predictions. The general performance of the system was benchmarked with a large dataset. An organism-wise cross-validation was performed to define confidence estimates, resulting in an average precision of 80% for 74% of all test sequences. The validation results show that the prediction performance was organism-independent and could reproduce the annotation of other automated systems as well as high-quality manual annotations. We applied our trained classification system to Xenopus laevis sequences, yielding functional annotation for more than half of the known expressed genome. Compared to the currently available annotation, we provided more than twice the number of contigs with good quality annotation, and additionally we assigned a confidence value to each predicted GO term. CONCLUSIONS: We present a complete automated annotation system that overcomes many of the usual problems by applying a controlled vocabulary of Gene Ontology and an established classification method on large and well-described sequence data sets. In a case study, the function for Xenopus laevis contig sequences was predicted and the results are publicly available at . |
format | Text |
id | pubmed-517617 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2004 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-5176172004-09-18 Applying Support Vector Machines for Gene ontology based gene function prediction Vinayagam, Arunachalam König, Rainer Moormann, Jutta Schubert, Falk Eils, Roland Glatting, Karl-Heinz Suhai, Sándor BMC Bioinformatics Methodology Article BACKGROUND: The current progress in sequencing projects calls for rapid, reliable and accurate function assignments of gene products. A variety of methods has been designed to annotate sequences on a large scale. However, these methods can either only be applied for specific subsets, or their results are not formalised, or they do not provide precise confidence estimates for their predictions. RESULTS: We have developed a large-scale annotation system that tackles all of these shortcomings. In our approach, annotation was provided through Gene Ontology terms by applying multiple Support Vector Machines (SVM) for the classification of correct and false predictions. The general performance of the system was benchmarked with a large dataset. An organism-wise cross-validation was performed to define confidence estimates, resulting in an average precision of 80% for 74% of all test sequences. The validation results show that the prediction performance was organism-independent and could reproduce the annotation of other automated systems as well as high-quality manual annotations. We applied our trained classification system to Xenopus laevis sequences, yielding functional annotation for more than half of the known expressed genome. Compared to the currently available annotation, we provided more than twice the number of contigs with good quality annotation, and additionally we assigned a confidence value to each predicted GO term. CONCLUSIONS: We present a complete automated annotation system that overcomes many of the usual problems by applying a controlled vocabulary of Gene Ontology and an established classification method on large and well-described sequence data sets. In a case study, the function for Xenopus laevis contig sequences was predicted and the results are publicly available at . BioMed Central 2004-08-26 /pmc/articles/PMC517617/ /pubmed/15333146 http://dx.doi.org/10.1186/1471-2105-5-116 Text en Copyright © 2004 Vinayagam et al; licensee BioMed Central Ltd. |
spellingShingle | Methodology Article Vinayagam, Arunachalam König, Rainer Moormann, Jutta Schubert, Falk Eils, Roland Glatting, Karl-Heinz Suhai, Sándor Applying Support Vector Machines for Gene ontology based gene function prediction |
title | Applying Support Vector Machines for Gene ontology based gene function prediction |
title_full | Applying Support Vector Machines for Gene ontology based gene function prediction |
title_fullStr | Applying Support Vector Machines for Gene ontology based gene function prediction |
title_full_unstemmed | Applying Support Vector Machines for Gene ontology based gene function prediction |
title_short | Applying Support Vector Machines for Gene ontology based gene function prediction |
title_sort | applying support vector machines for gene ontology based gene function prediction |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC517617/ https://www.ncbi.nlm.nih.gov/pubmed/15333146 http://dx.doi.org/10.1186/1471-2105-5-116 |
work_keys_str_mv | AT vinayagamarunachalam applyingsupportvectormachinesforgeneontologybasedgenefunctionprediction AT konigrainer applyingsupportvectormachinesforgeneontologybasedgenefunctionprediction AT moormannjutta applyingsupportvectormachinesforgeneontologybasedgenefunctionprediction AT schubertfalk applyingsupportvectormachinesforgeneontologybasedgenefunctionprediction AT eilsroland applyingsupportvectormachinesforgeneontologybasedgenefunctionprediction AT glattingkarlheinz applyingsupportvectormachinesforgeneontologybasedgenefunctionprediction AT suhaisandor applyingsupportvectormachinesforgeneontologybasedgenefunctionprediction |