Cargando…

Validating module network learning algorithms using simulated data

BACKGROUND: In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Despite the demonstrated success of such algorithms in uncovering biologically relevant regulatory relations, further development...

Descripción completa

Detalles Bibliográficos
Autores principales: Michoel, Tom, Maere, Steven, Bonnet, Eric, Joshi, Anagha, Saeys, Yvan, Van den Bulcke, Tim, Van Leemput, Koenraad, van Remortel, Piet, Kuiper, Martin, Marchal, Kathleen, Van de Peer, Yves
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1892074/
https://www.ncbi.nlm.nih.gov/pubmed/17493254
http://dx.doi.org/10.1186/1471-2105-8-S2-S5
_version_ 1782133821116776448
author Michoel, Tom
Maere, Steven
Bonnet, Eric
Joshi, Anagha
Saeys, Yvan
Van den Bulcke, Tim
Van Leemput, Koenraad
van Remortel, Piet
Kuiper, Martin
Marchal, Kathleen
Van de Peer, Yves
author_facet Michoel, Tom
Maere, Steven
Bonnet, Eric
Joshi, Anagha
Saeys, Yvan
Van den Bulcke, Tim
Van Leemput, Koenraad
van Remortel, Piet
Kuiper, Martin
Marchal, Kathleen
Van de Peer, Yves
author_sort Michoel, Tom
collection PubMed
description BACKGROUND: In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Despite the demonstrated success of such algorithms in uncovering biologically relevant regulatory relations, further developments in the area are hampered by a lack of tools to compare the performance of alternative module network learning strategies. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on the inference performance. RESULTS: Overall, application of Genomica and LeMoNe to simulated data sets gave comparable results. However, LeMoNe offers some advantages, one of them being that the learning process is considerably faster for larger data sets. Additionally, we show that the location of the regulators in the LeMoNe regulation programs and their conditional entropy may be used to prioritize regulators for functional validation, and that the combination of the bottom-up clustering strategy with the conditional entropy-based assignment of regulators improves the handling of missing or hidden regulators. CONCLUSION: We show that data simulators such as SynTReN are very well suited for the purpose of developing, testing and improving module network algorithms. We used SynTReN data to develop and test an alternative module network learning strategy, which is incorporated in the software package LeMoNe, and we provide evidence that this alternative strategy has several advantages with respect to existing methods.
format Text
id pubmed-1892074
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18920742007-06-15 Validating module network learning algorithms using simulated data Michoel, Tom Maere, Steven Bonnet, Eric Joshi, Anagha Saeys, Yvan Van den Bulcke, Tim Van Leemput, Koenraad van Remortel, Piet Kuiper, Martin Marchal, Kathleen Van de Peer, Yves BMC Bioinformatics Research BACKGROUND: In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Despite the demonstrated success of such algorithms in uncovering biologically relevant regulatory relations, further developments in the area are hampered by a lack of tools to compare the performance of alternative module network learning strategies. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on the inference performance. RESULTS: Overall, application of Genomica and LeMoNe to simulated data sets gave comparable results. However, LeMoNe offers some advantages, one of them being that the learning process is considerably faster for larger data sets. Additionally, we show that the location of the regulators in the LeMoNe regulation programs and their conditional entropy may be used to prioritize regulators for functional validation, and that the combination of the bottom-up clustering strategy with the conditional entropy-based assignment of regulators improves the handling of missing or hidden regulators. CONCLUSION: We show that data simulators such as SynTReN are very well suited for the purpose of developing, testing and improving module network algorithms. We used SynTReN data to develop and test an alternative module network learning strategy, which is incorporated in the software package LeMoNe, and we provide evidence that this alternative strategy has several advantages with respect to existing methods. BioMed Central 2007-05-03 /pmc/articles/PMC1892074/ /pubmed/17493254 http://dx.doi.org/10.1186/1471-2105-8-S2-S5 Text en Copyright © 2007 Michoel et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Michoel, Tom
Maere, Steven
Bonnet, Eric
Joshi, Anagha
Saeys, Yvan
Van den Bulcke, Tim
Van Leemput, Koenraad
van Remortel, Piet
Kuiper, Martin
Marchal, Kathleen
Van de Peer, Yves
Validating module network learning algorithms using simulated data
title Validating module network learning algorithms using simulated data
title_full Validating module network learning algorithms using simulated data
title_fullStr Validating module network learning algorithms using simulated data
title_full_unstemmed Validating module network learning algorithms using simulated data
title_short Validating module network learning algorithms using simulated data
title_sort validating module network learning algorithms using simulated data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1892074/
https://www.ncbi.nlm.nih.gov/pubmed/17493254
http://dx.doi.org/10.1186/1471-2105-8-S2-S5
work_keys_str_mv AT michoeltom validatingmodulenetworklearningalgorithmsusingsimulateddata
AT maeresteven validatingmodulenetworklearningalgorithmsusingsimulateddata
AT bonneteric validatingmodulenetworklearningalgorithmsusingsimulateddata
AT joshianagha validatingmodulenetworklearningalgorithmsusingsimulateddata
AT saeysyvan validatingmodulenetworklearningalgorithmsusingsimulateddata
AT vandenbulcketim validatingmodulenetworklearningalgorithmsusingsimulateddata
AT vanleemputkoenraad validatingmodulenetworklearningalgorithmsusingsimulateddata
AT vanremortelpiet validatingmodulenetworklearningalgorithmsusingsimulateddata
AT kuipermartin validatingmodulenetworklearningalgorithmsusingsimulateddata
AT marchalkathleen validatingmodulenetworklearningalgorithmsusingsimulateddata
AT vandepeeryves validatingmodulenetworklearningalgorithmsusingsimulateddata