Cargando…

Prediction of Drosophila melanogaster gene function using Support Vector Machines

BACKGROUND: While the genomes of hundreds of organisms have been sequenced and good approaches exist for finding protein encoding genes, an important remaining challenge is predicting the functions of the large fraction of genes for which there is no annotation. Large gene expression datasets from m...

Descripción completa

Detalles Bibliográficos
Autores principales: Mitsakakis, Nicholas, Razak, Zak, Escobar, Michael, Westwood, J Timothy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3669044/
https://www.ncbi.nlm.nih.gov/pubmed/23547736
http://dx.doi.org/10.1186/1756-0381-6-8
_version_ 1782271690942709760
author Mitsakakis, Nicholas
Razak, Zak
Escobar, Michael
Westwood, J Timothy
author_facet Mitsakakis, Nicholas
Razak, Zak
Escobar, Michael
Westwood, J Timothy
author_sort Mitsakakis, Nicholas
collection PubMed
description BACKGROUND: While the genomes of hundreds of organisms have been sequenced and good approaches exist for finding protein encoding genes, an important remaining challenge is predicting the functions of the large fraction of genes for which there is no annotation. Large gene expression datasets from microarray experiments already exist and many of these can be used to help assign potential functions to these genes. We have applied Support Vector Machines (SVM), a sigmoid fitting function and a stratified cross‐validation approach to analyze a large microarray experiment dataset from Drosophila melanogaster in order to predict possible functions for previously un‐annotated genes. A total of approximately 5043 different genes, or about one‐third of the predicted genes in the D. melanogaster genome, are represented in the dataset and 1854 (or 37%) of these genes are un‐annotated. RESULTS: 39 Gene Ontology Biological Process (GO‐BP) categories were found with precision value equal or larger than 0.75, when recall was fixed at the 0.4 level. For two of those categories, we have provided additional support for assigning given genes to the category by showing that the majority of transcripts for the genes belonging in a given category have a similar localization pattern during embryogenesis. Additionally, by assessing the predictions using a confidence score, we have been able to provide a putative GO‐BP term for 1422 previously un‐annotated genes or about 77% of the un‐annotated genes represented on the microarray and about 19% of all of the un‐annotated genes in the D. melanogaster genome. CONCLUSIONS: Our study successfully employs a number of SVM classifiers, accompanied by detailed calibration and validation techniques, to generate a number of predictions for new annotations for D. melanogaster genes. The applied probabilistic analysis to SVM output improves the interpretability of the prediction results and the objectivity of the validation procedure.
format Online
Article
Text
id pubmed-3669044
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36690442013-06-03 Prediction of Drosophila melanogaster gene function using Support Vector Machines Mitsakakis, Nicholas Razak, Zak Escobar, Michael Westwood, J Timothy BioData Min Research BACKGROUND: While the genomes of hundreds of organisms have been sequenced and good approaches exist for finding protein encoding genes, an important remaining challenge is predicting the functions of the large fraction of genes for which there is no annotation. Large gene expression datasets from microarray experiments already exist and many of these can be used to help assign potential functions to these genes. We have applied Support Vector Machines (SVM), a sigmoid fitting function and a stratified cross‐validation approach to analyze a large microarray experiment dataset from Drosophila melanogaster in order to predict possible functions for previously un‐annotated genes. A total of approximately 5043 different genes, or about one‐third of the predicted genes in the D. melanogaster genome, are represented in the dataset and 1854 (or 37%) of these genes are un‐annotated. RESULTS: 39 Gene Ontology Biological Process (GO‐BP) categories were found with precision value equal or larger than 0.75, when recall was fixed at the 0.4 level. For two of those categories, we have provided additional support for assigning given genes to the category by showing that the majority of transcripts for the genes belonging in a given category have a similar localization pattern during embryogenesis. Additionally, by assessing the predictions using a confidence score, we have been able to provide a putative GO‐BP term for 1422 previously un‐annotated genes or about 77% of the un‐annotated genes represented on the microarray and about 19% of all of the un‐annotated genes in the D. melanogaster genome. CONCLUSIONS: Our study successfully employs a number of SVM classifiers, accompanied by detailed calibration and validation techniques, to generate a number of predictions for new annotations for D. melanogaster genes. The applied probabilistic analysis to SVM output improves the interpretability of the prediction results and the objectivity of the validation procedure. BioMed Central 2013-04-02 /pmc/articles/PMC3669044/ /pubmed/23547736 http://dx.doi.org/10.1186/1756-0381-6-8 Text en Copyright © 2013 Mitsakakis et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Mitsakakis, Nicholas
Razak, Zak
Escobar, Michael
Westwood, J Timothy
Prediction of Drosophila melanogaster gene function using Support Vector Machines
title Prediction of Drosophila melanogaster gene function using Support Vector Machines
title_full Prediction of Drosophila melanogaster gene function using Support Vector Machines
title_fullStr Prediction of Drosophila melanogaster gene function using Support Vector Machines
title_full_unstemmed Prediction of Drosophila melanogaster gene function using Support Vector Machines
title_short Prediction of Drosophila melanogaster gene function using Support Vector Machines
title_sort prediction of drosophila melanogaster gene function using support vector machines
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3669044/
https://www.ncbi.nlm.nih.gov/pubmed/23547736
http://dx.doi.org/10.1186/1756-0381-6-8
work_keys_str_mv AT mitsakakisnicholas predictionofdrosophilamelanogastergenefunctionusingsupportvectormachines
AT razakzak predictionofdrosophilamelanogastergenefunctionusingsupportvectormachines
AT escobarmichael predictionofdrosophilamelanogastergenefunctionusingsupportvectormachines
AT westwoodjtimothy predictionofdrosophilamelanogastergenefunctionusingsupportvectormachines