Cargando…

Inferring Regulatory Networks from Expression Data Using Tree-Based Methods

One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challeng...

Descripción completa

Detalles Bibliográficos
Autores principales: Huynh-Thu, Vân Anh, Irrthum, Alexandre, Wehenkel, Louis, Geurts, Pierre
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2946910/
https://www.ncbi.nlm.nih.gov/pubmed/20927193
http://dx.doi.org/10.1371/journal.pone.0012776
_version_ 1782187343829008384
author Huynh-Thu, Vân Anh
Irrthum, Alexandre
Wehenkel, Louis
Geurts, Pierre
author_facet Huynh-Thu, Vân Anh
Irrthum, Alexandre
Wehenkel, Louis
Geurts, Pierre
author_sort Huynh-Thu, Vân Anh
collection PubMed
description One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions.
format Text
id pubmed-2946910
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29469102010-10-06 Inferring Regulatory Networks from Expression Data Using Tree-Based Methods Huynh-Thu, Vân Anh Irrthum, Alexandre Wehenkel, Louis Geurts, Pierre PLoS One Research Article One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions. Public Library of Science 2010-09-28 /pmc/articles/PMC2946910/ /pubmed/20927193 http://dx.doi.org/10.1371/journal.pone.0012776 Text en Huynh-Thu et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Huynh-Thu, Vân Anh
Irrthum, Alexandre
Wehenkel, Louis
Geurts, Pierre
Inferring Regulatory Networks from Expression Data Using Tree-Based Methods
title Inferring Regulatory Networks from Expression Data Using Tree-Based Methods
title_full Inferring Regulatory Networks from Expression Data Using Tree-Based Methods
title_fullStr Inferring Regulatory Networks from Expression Data Using Tree-Based Methods
title_full_unstemmed Inferring Regulatory Networks from Expression Data Using Tree-Based Methods
title_short Inferring Regulatory Networks from Expression Data Using Tree-Based Methods
title_sort inferring regulatory networks from expression data using tree-based methods
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2946910/
https://www.ncbi.nlm.nih.gov/pubmed/20927193
http://dx.doi.org/10.1371/journal.pone.0012776
work_keys_str_mv AT huynhthuvananh inferringregulatorynetworksfromexpressiondatausingtreebasedmethods
AT irrthumalexandre inferringregulatorynetworksfromexpressiondatausingtreebasedmethods
AT wehenkellouis inferringregulatorynetworksfromexpressiondatausingtreebasedmethods
AT geurtspierre inferringregulatorynetworksfromexpressiondatausingtreebasedmethods