Cargando…

NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms

One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene express...

Descripción completa

Detalles Bibliográficos
Autores principales: Ruyssinck, Joeri, Huynh-Thu, Vân Anh, Geurts, Pierre, Dhaene, Tom, Demeester, Piet, Saeys, Yvan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965471/
https://www.ncbi.nlm.nih.gov/pubmed/24667482
http://dx.doi.org/10.1371/journal.pone.0092709
_version_ 1782308807914815488
author Ruyssinck, Joeri
Huynh-Thu, Vân Anh
Geurts, Pierre
Dhaene, Tom
Demeester, Piet
Saeys, Yvan
author_facet Ruyssinck, Joeri
Huynh-Thu, Vân Anh
Geurts, Pierre
Dhaene, Tom
Demeester, Piet
Saeys, Yvan
author_sort Ruyssinck, Joeri
collection PubMed
description One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.
format Online
Article
Text
id pubmed-3965471
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39654712014-03-27 NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms Ruyssinck, Joeri Huynh-Thu, Vân Anh Geurts, Pierre Dhaene, Tom Demeester, Piet Saeys, Yvan PLoS One Research Article One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available. Public Library of Science 2014-03-25 /pmc/articles/PMC3965471/ /pubmed/24667482 http://dx.doi.org/10.1371/journal.pone.0092709 Text en © 2014 Ruyssinck et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Ruyssinck, Joeri
Huynh-Thu, Vân Anh
Geurts, Pierre
Dhaene, Tom
Demeester, Piet
Saeys, Yvan
NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms
title NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms
title_full NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms
title_fullStr NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms
title_full_unstemmed NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms
title_short NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms
title_sort nimefi: gene regulatory network inference using multiple ensemble feature importance algorithms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965471/
https://www.ncbi.nlm.nih.gov/pubmed/24667482
http://dx.doi.org/10.1371/journal.pone.0092709
work_keys_str_mv AT ruyssinckjoeri nimefigeneregulatorynetworkinferenceusingmultipleensemblefeatureimportancealgorithms
AT huynhthuvananh nimefigeneregulatorynetworkinferenceusingmultipleensemblefeatureimportancealgorithms
AT geurtspierre nimefigeneregulatorynetworkinferenceusingmultipleensemblefeatureimportancealgorithms
AT dhaenetom nimefigeneregulatorynetworkinferenceusingmultipleensemblefeatureimportancealgorithms
AT demeesterpiet nimefigeneregulatorynetworkinferenceusingmultipleensemblefeatureimportancealgorithms
AT saeysyvan nimefigeneregulatorynetworkinferenceusingmultipleensemblefeatureimportancealgorithms