Cargando…

Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data

Gene-to-gene networks, such as Gene Regulatory Networks (GRN) and Predictive Expression Networks (PEN) capture relationships between genes and are beneficial for use in downstream biological analyses. There exists multiple network inference tools to produce these gene-to-gene networks from matrices...

Descripción completa

Detalles Bibliográficos
Autores principales: Walker, Angelica M., Cliff, Ashley, Romero, Jonathon, Shah, Manesh B., Jones, Piet, Felipe Machado Gazolla, Joao Gabriel, Jacobson, Daniel A, Kainer, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9260260/
https://www.ncbi.nlm.nih.gov/pubmed/35832622
http://dx.doi.org/10.1016/j.csbj.2022.06.037
_version_ 1784741983693045760
author Walker, Angelica M.
Cliff, Ashley
Romero, Jonathon
Shah, Manesh B.
Jones, Piet
Felipe Machado Gazolla, Joao Gabriel
Jacobson, Daniel A
Kainer, David
author_facet Walker, Angelica M.
Cliff, Ashley
Romero, Jonathon
Shah, Manesh B.
Jones, Piet
Felipe Machado Gazolla, Joao Gabriel
Jacobson, Daniel A
Kainer, David
author_sort Walker, Angelica M.
collection PubMed
description Gene-to-gene networks, such as Gene Regulatory Networks (GRN) and Predictive Expression Networks (PEN) capture relationships between genes and are beneficial for use in downstream biological analyses. There exists multiple network inference tools to produce these gene-to-gene networks from matrices of gene expression data. Random Forest-Leave One Out Prediction (RF-LOOP) is a method that has been shown to be efficient at producing these gene-to-gene networks, frequently known as GEne Network Inference with Ensemble of trees (GENIE3). Random Forest can be replaced in this process by iterative Random Forest (iRF), which performs variable selection and boosting. Here we validate that iterative Random Forest-Leave One Out Prediction (iRF-LOOP) produces higher quality networks than GENIE3 (RF-LOOP). We use both synthetic and empirical networks from the Dialogue for Reverse Engineering Assessment and Methods (DREAM) Challenges by Sage Bionetworks, as well as two additional empirical networks created from Arabidopsis thaliana and Populus trichocarpa expression data.
format Online
Article
Text
id pubmed-9260260
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-92602602022-07-12 Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data Walker, Angelica M. Cliff, Ashley Romero, Jonathon Shah, Manesh B. Jones, Piet Felipe Machado Gazolla, Joao Gabriel Jacobson, Daniel A Kainer, David Comput Struct Biotechnol J Research Article Gene-to-gene networks, such as Gene Regulatory Networks (GRN) and Predictive Expression Networks (PEN) capture relationships between genes and are beneficial for use in downstream biological analyses. There exists multiple network inference tools to produce these gene-to-gene networks from matrices of gene expression data. Random Forest-Leave One Out Prediction (RF-LOOP) is a method that has been shown to be efficient at producing these gene-to-gene networks, frequently known as GEne Network Inference with Ensemble of trees (GENIE3). Random Forest can be replaced in this process by iterative Random Forest (iRF), which performs variable selection and boosting. Here we validate that iterative Random Forest-Leave One Out Prediction (iRF-LOOP) produces higher quality networks than GENIE3 (RF-LOOP). We use both synthetic and empirical networks from the Dialogue for Reverse Engineering Assessment and Methods (DREAM) Challenges by Sage Bionetworks, as well as two additional empirical networks created from Arabidopsis thaliana and Populus trichocarpa expression data. Research Network of Computational and Structural Biotechnology 2022-06-22 /pmc/articles/PMC9260260/ /pubmed/35832622 http://dx.doi.org/10.1016/j.csbj.2022.06.037 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Walker, Angelica M.
Cliff, Ashley
Romero, Jonathon
Shah, Manesh B.
Jones, Piet
Felipe Machado Gazolla, Joao Gabriel
Jacobson, Daniel A
Kainer, David
Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
title Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
title_full Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
title_fullStr Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
title_full_unstemmed Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
title_short Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
title_sort evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9260260/
https://www.ncbi.nlm.nih.gov/pubmed/35832622
http://dx.doi.org/10.1016/j.csbj.2022.06.037
work_keys_str_mv AT walkerangelicam evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata
AT cliffashley evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata
AT romerojonathon evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata
AT shahmaneshb evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata
AT jonespiet evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata
AT felipemachadogazollajoaogabriel evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata
AT jacobsondaniela evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata
AT kainerdavid evaluatingtheperformanceofrandomforestanditerativerandomforestbasedmethodswhenappliedtogeneexpressiondata