Cargando…
Gene function prediction using labeled and unlabeled data
BACKGROUND: In general, gene function prediction can be formalized as a classification problem based on machine learning technique. Usually, both labeled positive and negative samples are needed to train the classifier. For the problem of gene function prediction, however, the available information...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2275242/ https://www.ncbi.nlm.nih.gov/pubmed/18221567 http://dx.doi.org/10.1186/1471-2105-9-57 |
_version_ | 1782151839994609664 |
---|---|
author | Zhao, Xing-Ming Wang, Yong Chen, Luonan Aihara, Kazuyuki |
author_facet | Zhao, Xing-Ming Wang, Yong Chen, Luonan Aihara, Kazuyuki |
author_sort | Zhao, Xing-Ming |
collection | PubMed |
description | BACKGROUND: In general, gene function prediction can be formalized as a classification problem based on machine learning technique. Usually, both labeled positive and negative samples are needed to train the classifier. For the problem of gene function prediction, however, the available information is only about positive samples. In other words, we know which genes have the function of interested, while it is generally unclear which genes do not have the function, i.e. the negative samples. If all the genes outside of the target functional family are seen as negative samples, the imbalanced problem will arise because there are only a relatively small number of genes annotated in each family. Furthermore, the classifier may be degraded by the false negatives in the heuristically generated negative samples. RESULTS: In this paper, we present a new technique, namely Annotating Genes with Positive Samples (AGPS), for defining negative samples in gene function prediction. With the defined negative samples, it is straightforward to predict the functions of unknown genes. In addition, the AGPS algorithm is able to integrate various kinds of data sources to predict gene functions in a reliable and accurate manner. With the one-class and two-class Support Vector Machines as the core learning algorithm, the AGPS algorithm shows good performances for function prediction on yeast genes. CONCLUSION: We proposed a new method for defining negative samples in gene function prediction. Experimental results on yeast genes show that AGPS yields good performances on both training and test sets. In addition, the overlapping between prediction results and GO annotations on unknown genes also demonstrates the effectiveness of the proposed method. |
format | Text |
id | pubmed-2275242 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-22752422008-03-26 Gene function prediction using labeled and unlabeled data Zhao, Xing-Ming Wang, Yong Chen, Luonan Aihara, Kazuyuki BMC Bioinformatics Research Article BACKGROUND: In general, gene function prediction can be formalized as a classification problem based on machine learning technique. Usually, both labeled positive and negative samples are needed to train the classifier. For the problem of gene function prediction, however, the available information is only about positive samples. In other words, we know which genes have the function of interested, while it is generally unclear which genes do not have the function, i.e. the negative samples. If all the genes outside of the target functional family are seen as negative samples, the imbalanced problem will arise because there are only a relatively small number of genes annotated in each family. Furthermore, the classifier may be degraded by the false negatives in the heuristically generated negative samples. RESULTS: In this paper, we present a new technique, namely Annotating Genes with Positive Samples (AGPS), for defining negative samples in gene function prediction. With the defined negative samples, it is straightforward to predict the functions of unknown genes. In addition, the AGPS algorithm is able to integrate various kinds of data sources to predict gene functions in a reliable and accurate manner. With the one-class and two-class Support Vector Machines as the core learning algorithm, the AGPS algorithm shows good performances for function prediction on yeast genes. CONCLUSION: We proposed a new method for defining negative samples in gene function prediction. Experimental results on yeast genes show that AGPS yields good performances on both training and test sets. In addition, the overlapping between prediction results and GO annotations on unknown genes also demonstrates the effectiveness of the proposed method. BioMed Central 2008-01-28 /pmc/articles/PMC2275242/ /pubmed/18221567 http://dx.doi.org/10.1186/1471-2105-9-57 Text en Copyright © 2008 Zhao et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Zhao, Xing-Ming Wang, Yong Chen, Luonan Aihara, Kazuyuki Gene function prediction using labeled and unlabeled data |
title | Gene function prediction using labeled and unlabeled data |
title_full | Gene function prediction using labeled and unlabeled data |
title_fullStr | Gene function prediction using labeled and unlabeled data |
title_full_unstemmed | Gene function prediction using labeled and unlabeled data |
title_short | Gene function prediction using labeled and unlabeled data |
title_sort | gene function prediction using labeled and unlabeled data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2275242/ https://www.ncbi.nlm.nih.gov/pubmed/18221567 http://dx.doi.org/10.1186/1471-2105-9-57 |
work_keys_str_mv | AT zhaoxingming genefunctionpredictionusinglabeledandunlabeleddata AT wangyong genefunctionpredictionusinglabeledandunlabeleddata AT chenluonan genefunctionpredictionusinglabeledandunlabeleddata AT aiharakazuyuki genefunctionpredictionusinglabeledandunlabeleddata |