Cargando…

Gene function prediction using labeled and unlabeled data

BACKGROUND: In general, gene function prediction can be formalized as a classification problem based on machine learning technique. Usually, both labeled positive and negative samples are needed to train the classifier. For the problem of gene function prediction, however, the available information...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Xing-Ming, Wang, Yong, Chen, Luonan, Aihara, Kazuyuki
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2275242/
https://www.ncbi.nlm.nih.gov/pubmed/18221567
http://dx.doi.org/10.1186/1471-2105-9-57
_version_ 1782151839994609664
author Zhao, Xing-Ming
Wang, Yong
Chen, Luonan
Aihara, Kazuyuki
author_facet Zhao, Xing-Ming
Wang, Yong
Chen, Luonan
Aihara, Kazuyuki
author_sort Zhao, Xing-Ming
collection PubMed
description BACKGROUND: In general, gene function prediction can be formalized as a classification problem based on machine learning technique. Usually, both labeled positive and negative samples are needed to train the classifier. For the problem of gene function prediction, however, the available information is only about positive samples. In other words, we know which genes have the function of interested, while it is generally unclear which genes do not have the function, i.e. the negative samples. If all the genes outside of the target functional family are seen as negative samples, the imbalanced problem will arise because there are only a relatively small number of genes annotated in each family. Furthermore, the classifier may be degraded by the false negatives in the heuristically generated negative samples. RESULTS: In this paper, we present a new technique, namely Annotating Genes with Positive Samples (AGPS), for defining negative samples in gene function prediction. With the defined negative samples, it is straightforward to predict the functions of unknown genes. In addition, the AGPS algorithm is able to integrate various kinds of data sources to predict gene functions in a reliable and accurate manner. With the one-class and two-class Support Vector Machines as the core learning algorithm, the AGPS algorithm shows good performances for function prediction on yeast genes. CONCLUSION: We proposed a new method for defining negative samples in gene function prediction. Experimental results on yeast genes show that AGPS yields good performances on both training and test sets. In addition, the overlapping between prediction results and GO annotations on unknown genes also demonstrates the effectiveness of the proposed method.
format Text
id pubmed-2275242
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22752422008-03-26 Gene function prediction using labeled and unlabeled data Zhao, Xing-Ming Wang, Yong Chen, Luonan Aihara, Kazuyuki BMC Bioinformatics Research Article BACKGROUND: In general, gene function prediction can be formalized as a classification problem based on machine learning technique. Usually, both labeled positive and negative samples are needed to train the classifier. For the problem of gene function prediction, however, the available information is only about positive samples. In other words, we know which genes have the function of interested, while it is generally unclear which genes do not have the function, i.e. the negative samples. If all the genes outside of the target functional family are seen as negative samples, the imbalanced problem will arise because there are only a relatively small number of genes annotated in each family. Furthermore, the classifier may be degraded by the false negatives in the heuristically generated negative samples. RESULTS: In this paper, we present a new technique, namely Annotating Genes with Positive Samples (AGPS), for defining negative samples in gene function prediction. With the defined negative samples, it is straightforward to predict the functions of unknown genes. In addition, the AGPS algorithm is able to integrate various kinds of data sources to predict gene functions in a reliable and accurate manner. With the one-class and two-class Support Vector Machines as the core learning algorithm, the AGPS algorithm shows good performances for function prediction on yeast genes. CONCLUSION: We proposed a new method for defining negative samples in gene function prediction. Experimental results on yeast genes show that AGPS yields good performances on both training and test sets. In addition, the overlapping between prediction results and GO annotations on unknown genes also demonstrates the effectiveness of the proposed method. BioMed Central 2008-01-28 /pmc/articles/PMC2275242/ /pubmed/18221567 http://dx.doi.org/10.1186/1471-2105-9-57 Text en Copyright © 2008 Zhao et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zhao, Xing-Ming
Wang, Yong
Chen, Luonan
Aihara, Kazuyuki
Gene function prediction using labeled and unlabeled data
title Gene function prediction using labeled and unlabeled data
title_full Gene function prediction using labeled and unlabeled data
title_fullStr Gene function prediction using labeled and unlabeled data
title_full_unstemmed Gene function prediction using labeled and unlabeled data
title_short Gene function prediction using labeled and unlabeled data
title_sort gene function prediction using labeled and unlabeled data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2275242/
https://www.ncbi.nlm.nih.gov/pubmed/18221567
http://dx.doi.org/10.1186/1471-2105-9-57
work_keys_str_mv AT zhaoxingming genefunctionpredictionusinglabeledandunlabeleddata
AT wangyong genefunctionpredictionusinglabeledandunlabeleddata
AT chenluonan genefunctionpredictionusinglabeledandunlabeleddata
AT aiharakazuyuki genefunctionpredictionusinglabeledandunlabeleddata