Cargando…
Prediction of Drosophila melanogaster gene function using Support Vector Machines
BACKGROUND: While the genomes of hundreds of organisms have been sequenced and good approaches exist for finding protein encoding genes, an important remaining challenge is predicting the functions of the large fraction of genes for which there is no annotation. Large gene expression datasets from m...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3669044/ https://www.ncbi.nlm.nih.gov/pubmed/23547736 http://dx.doi.org/10.1186/1756-0381-6-8 |
_version_ | 1782271690942709760 |
---|---|
author | Mitsakakis, Nicholas Razak, Zak Escobar, Michael Westwood, J Timothy |
author_facet | Mitsakakis, Nicholas Razak, Zak Escobar, Michael Westwood, J Timothy |
author_sort | Mitsakakis, Nicholas |
collection | PubMed |
description | BACKGROUND: While the genomes of hundreds of organisms have been sequenced and good approaches exist for finding protein encoding genes, an important remaining challenge is predicting the functions of the large fraction of genes for which there is no annotation. Large gene expression datasets from microarray experiments already exist and many of these can be used to help assign potential functions to these genes. We have applied Support Vector Machines (SVM), a sigmoid fitting function and a stratified cross‐validation approach to analyze a large microarray experiment dataset from Drosophila melanogaster in order to predict possible functions for previously un‐annotated genes. A total of approximately 5043 different genes, or about one‐third of the predicted genes in the D. melanogaster genome, are represented in the dataset and 1854 (or 37%) of these genes are un‐annotated. RESULTS: 39 Gene Ontology Biological Process (GO‐BP) categories were found with precision value equal or larger than 0.75, when recall was fixed at the 0.4 level. For two of those categories, we have provided additional support for assigning given genes to the category by showing that the majority of transcripts for the genes belonging in a given category have a similar localization pattern during embryogenesis. Additionally, by assessing the predictions using a confidence score, we have been able to provide a putative GO‐BP term for 1422 previously un‐annotated genes or about 77% of the un‐annotated genes represented on the microarray and about 19% of all of the un‐annotated genes in the D. melanogaster genome. CONCLUSIONS: Our study successfully employs a number of SVM classifiers, accompanied by detailed calibration and validation techniques, to generate a number of predictions for new annotations for D. melanogaster genes. The applied probabilistic analysis to SVM output improves the interpretability of the prediction results and the objectivity of the validation procedure. |
format | Online Article Text |
id | pubmed-3669044 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-36690442013-06-03 Prediction of Drosophila melanogaster gene function using Support Vector Machines Mitsakakis, Nicholas Razak, Zak Escobar, Michael Westwood, J Timothy BioData Min Research BACKGROUND: While the genomes of hundreds of organisms have been sequenced and good approaches exist for finding protein encoding genes, an important remaining challenge is predicting the functions of the large fraction of genes for which there is no annotation. Large gene expression datasets from microarray experiments already exist and many of these can be used to help assign potential functions to these genes. We have applied Support Vector Machines (SVM), a sigmoid fitting function and a stratified cross‐validation approach to analyze a large microarray experiment dataset from Drosophila melanogaster in order to predict possible functions for previously un‐annotated genes. A total of approximately 5043 different genes, or about one‐third of the predicted genes in the D. melanogaster genome, are represented in the dataset and 1854 (or 37%) of these genes are un‐annotated. RESULTS: 39 Gene Ontology Biological Process (GO‐BP) categories were found with precision value equal or larger than 0.75, when recall was fixed at the 0.4 level. For two of those categories, we have provided additional support for assigning given genes to the category by showing that the majority of transcripts for the genes belonging in a given category have a similar localization pattern during embryogenesis. Additionally, by assessing the predictions using a confidence score, we have been able to provide a putative GO‐BP term for 1422 previously un‐annotated genes or about 77% of the un‐annotated genes represented on the microarray and about 19% of all of the un‐annotated genes in the D. melanogaster genome. CONCLUSIONS: Our study successfully employs a number of SVM classifiers, accompanied by detailed calibration and validation techniques, to generate a number of predictions for new annotations for D. melanogaster genes. The applied probabilistic analysis to SVM output improves the interpretability of the prediction results and the objectivity of the validation procedure. BioMed Central 2013-04-02 /pmc/articles/PMC3669044/ /pubmed/23547736 http://dx.doi.org/10.1186/1756-0381-6-8 Text en Copyright © 2013 Mitsakakis et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Mitsakakis, Nicholas Razak, Zak Escobar, Michael Westwood, J Timothy Prediction of Drosophila melanogaster gene function using Support Vector Machines |
title | Prediction of Drosophila melanogaster gene function using Support Vector Machines |
title_full | Prediction of Drosophila melanogaster gene function using Support Vector Machines |
title_fullStr | Prediction of Drosophila melanogaster gene function using Support Vector Machines |
title_full_unstemmed | Prediction of Drosophila melanogaster gene function using Support Vector Machines |
title_short | Prediction of Drosophila melanogaster gene function using Support Vector Machines |
title_sort | prediction of drosophila melanogaster gene function using support vector machines |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3669044/ https://www.ncbi.nlm.nih.gov/pubmed/23547736 http://dx.doi.org/10.1186/1756-0381-6-8 |
work_keys_str_mv | AT mitsakakisnicholas predictionofdrosophilamelanogastergenefunctionusingsupportvectormachines AT razakzak predictionofdrosophilamelanogastergenefunctionusingsupportvectormachines AT escobarmichael predictionofdrosophilamelanogastergenefunctionusingsupportvectormachines AT westwoodjtimothy predictionofdrosophilamelanogastergenefunctionusingsupportvectormachines |