Cargando…

Predicting gene function using few positive examples and unlabeled ones

BACKGROUND: A large amount of functional genomic data have provided enough knowledge in predicting gene function computationally, which uses known functional annotations and relationship between unknown genes and known ones to map unknown genes to GO functional terms. The prediction procedure is usu...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Yiming, Li, Zhoujun, Wang, Xiaofeng, Feng, Jiali, Hu, Xiaohua
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2975410/
https://www.ncbi.nlm.nih.gov/pubmed/21047378
http://dx.doi.org/10.1186/1471-2164-11-S2-S11
_version_ 1782190942843830272
author Chen, Yiming
Li, Zhoujun
Wang, Xiaofeng
Feng, Jiali
Hu, Xiaohua
author_facet Chen, Yiming
Li, Zhoujun
Wang, Xiaofeng
Feng, Jiali
Hu, Xiaohua
author_sort Chen, Yiming
collection PubMed
description BACKGROUND: A large amount of functional genomic data have provided enough knowledge in predicting gene function computationally, which uses known functional annotations and relationship between unknown genes and known ones to map unknown genes to GO functional terms. The prediction procedure is usually formulated as binary classification problem. Training binary classifier needs both positive examples and negative ones that have almost the same size. However, from various annotation database, we can only obtain few positive genes annotation for most offunctional terms, that is, there are only few positive examples for training classifier, which makes predicting directly gene function infeasible. RESULTS: We propose a novel approach SPE_RNE to train classifier for each functional term. Firstly, positive examples set is enlarged by creating synthetic positive examples. Secondly, representative negative examples are selected by training SVM(support vector machine) iteratively to move classification hyperplane to a appropriate place. Lastly, an optimal SVM classifier are trained by using grid search technique. On combined kernel ofYeast protein sequence, microarray expression, protein-protein interaction and GO functional annotation data, we compare SPE_RNE with other three typical methods in three classical performance measures recall R, precise P and their combination F: twoclass considers all unlabeled genes as negative examples, twoclassbal selects randomly same number negative examples from unlabeled gene, PSoL selects a negative examples set that are far from positive examples and far from each other. CONCLUSIONS: In test data and unknown genes data, we compute average and variant of measure F. The experiments showthat our approach has better generalized performance and practical prediction capacity. In addition, our method can also be used for other organisms such as human.
format Text
id pubmed-2975410
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29754102010-11-09 Predicting gene function using few positive examples and unlabeled ones Chen, Yiming Li, Zhoujun Wang, Xiaofeng Feng, Jiali Hu, Xiaohua BMC Genomics Research BACKGROUND: A large amount of functional genomic data have provided enough knowledge in predicting gene function computationally, which uses known functional annotations and relationship between unknown genes and known ones to map unknown genes to GO functional terms. The prediction procedure is usually formulated as binary classification problem. Training binary classifier needs both positive examples and negative ones that have almost the same size. However, from various annotation database, we can only obtain few positive genes annotation for most offunctional terms, that is, there are only few positive examples for training classifier, which makes predicting directly gene function infeasible. RESULTS: We propose a novel approach SPE_RNE to train classifier for each functional term. Firstly, positive examples set is enlarged by creating synthetic positive examples. Secondly, representative negative examples are selected by training SVM(support vector machine) iteratively to move classification hyperplane to a appropriate place. Lastly, an optimal SVM classifier are trained by using grid search technique. On combined kernel ofYeast protein sequence, microarray expression, protein-protein interaction and GO functional annotation data, we compare SPE_RNE with other three typical methods in three classical performance measures recall R, precise P and their combination F: twoclass considers all unlabeled genes as negative examples, twoclassbal selects randomly same number negative examples from unlabeled gene, PSoL selects a negative examples set that are far from positive examples and far from each other. CONCLUSIONS: In test data and unknown genes data, we compute average and variant of measure F. The experiments showthat our approach has better generalized performance and practical prediction capacity. In addition, our method can also be used for other organisms such as human. BioMed Central 2010-11-02 /pmc/articles/PMC2975410/ /pubmed/21047378 http://dx.doi.org/10.1186/1471-2164-11-S2-S11 Text en Copyright ©2010 Li et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Chen, Yiming
Li, Zhoujun
Wang, Xiaofeng
Feng, Jiali
Hu, Xiaohua
Predicting gene function using few positive examples and unlabeled ones
title Predicting gene function using few positive examples and unlabeled ones
title_full Predicting gene function using few positive examples and unlabeled ones
title_fullStr Predicting gene function using few positive examples and unlabeled ones
title_full_unstemmed Predicting gene function using few positive examples and unlabeled ones
title_short Predicting gene function using few positive examples and unlabeled ones
title_sort predicting gene function using few positive examples and unlabeled ones
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2975410/
https://www.ncbi.nlm.nih.gov/pubmed/21047378
http://dx.doi.org/10.1186/1471-2164-11-S2-S11
work_keys_str_mv AT chenyiming predictinggenefunctionusingfewpositiveexamplesandunlabeledones
AT lizhoujun predictinggenefunctionusingfewpositiveexamplesandunlabeledones
AT wangxiaofeng predictinggenefunctionusingfewpositiveexamplesandunlabeledones
AT fengjiali predictinggenefunctionusingfewpositiveexamplesandunlabeledones
AT huxiaohua predictinggenefunctionusingfewpositiveexamplesandunlabeledones