Cargando…
Learning gene regulatory networks from only positive and unlabeled data
BACKGROUND: Recently, supervised learning methods have been exploited to reconstruct gene regulatory networks from gene expression data. The reconstruction of a network is modeled as a binary classification problem for each pair of genes. A statistical classifier is trained to recognize the relation...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887423/ https://www.ncbi.nlm.nih.gov/pubmed/20444264 http://dx.doi.org/10.1186/1471-2105-11-228 |
_version_ | 1782182550249144320 |
---|---|
author | Cerulo, Luigi Elkan, Charles Ceccarelli, Michele |
author_facet | Cerulo, Luigi Elkan, Charles Ceccarelli, Michele |
author_sort | Cerulo, Luigi |
collection | PubMed |
description | BACKGROUND: Recently, supervised learning methods have been exploited to reconstruct gene regulatory networks from gene expression data. The reconstruction of a network is modeled as a binary classification problem for each pair of genes. A statistical classifier is trained to recognize the relationships between the activation profiles of gene pairs. This approach has been proven to outperform previous unsupervised methods. However, the supervised approach raises open questions. In particular, although known regulatory connections can safely be assumed to be positive training examples, obtaining negative examples is not straightforward, because definite knowledge is typically not available that a given pair of genes do not interact. RESULTS: A recent advance in research on data mining is a method capable of learning a classifier from only positive and unlabeled examples, that does not need labeled negative examples. Applied to the reconstruction of gene regulatory networks, we show that this method significantly outperforms the current state of the art of machine learning methods. We assess the new method using both simulated and experimental data, and obtain major performance improvement. CONCLUSIONS: Compared to unsupervised methods for gene network inference, supervised methods are potentially more accurate, but for training they need a complete set of known regulatory connections. A supervised method that can be trained using only positive and unlabeled data, as presented in this paper, is especially beneficial for the task of inferring gene regulatory networks, because only an incomplete set of known regulatory connections is available in public databases such as RegulonDB, TRRD, KEGG, Transfac, and IPA. |
format | Text |
id | pubmed-2887423 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-28874232010-06-18 Learning gene regulatory networks from only positive and unlabeled data Cerulo, Luigi Elkan, Charles Ceccarelli, Michele BMC Bioinformatics Research article BACKGROUND: Recently, supervised learning methods have been exploited to reconstruct gene regulatory networks from gene expression data. The reconstruction of a network is modeled as a binary classification problem for each pair of genes. A statistical classifier is trained to recognize the relationships between the activation profiles of gene pairs. This approach has been proven to outperform previous unsupervised methods. However, the supervised approach raises open questions. In particular, although known regulatory connections can safely be assumed to be positive training examples, obtaining negative examples is not straightforward, because definite knowledge is typically not available that a given pair of genes do not interact. RESULTS: A recent advance in research on data mining is a method capable of learning a classifier from only positive and unlabeled examples, that does not need labeled negative examples. Applied to the reconstruction of gene regulatory networks, we show that this method significantly outperforms the current state of the art of machine learning methods. We assess the new method using both simulated and experimental data, and obtain major performance improvement. CONCLUSIONS: Compared to unsupervised methods for gene network inference, supervised methods are potentially more accurate, but for training they need a complete set of known regulatory connections. A supervised method that can be trained using only positive and unlabeled data, as presented in this paper, is especially beneficial for the task of inferring gene regulatory networks, because only an incomplete set of known regulatory connections is available in public databases such as RegulonDB, TRRD, KEGG, Transfac, and IPA. BioMed Central 2010-05-05 /pmc/articles/PMC2887423/ /pubmed/20444264 http://dx.doi.org/10.1186/1471-2105-11-228 Text en Copyright ©2010 Cerulo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research article Cerulo, Luigi Elkan, Charles Ceccarelli, Michele Learning gene regulatory networks from only positive and unlabeled data |
title | Learning gene regulatory networks from only positive and unlabeled data |
title_full | Learning gene regulatory networks from only positive and unlabeled data |
title_fullStr | Learning gene regulatory networks from only positive and unlabeled data |
title_full_unstemmed | Learning gene regulatory networks from only positive and unlabeled data |
title_short | Learning gene regulatory networks from only positive and unlabeled data |
title_sort | learning gene regulatory networks from only positive and unlabeled data |
topic | Research article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887423/ https://www.ncbi.nlm.nih.gov/pubmed/20444264 http://dx.doi.org/10.1186/1471-2105-11-228 |
work_keys_str_mv | AT ceruloluigi learninggeneregulatorynetworksfromonlypositiveandunlabeleddata AT elkancharles learninggeneregulatorynetworksfromonlypositiveandunlabeleddata AT ceccarellimichele learninggeneregulatorynetworksfromonlypositiveandunlabeleddata |