Cargando…

Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference

Inferring the topology of a gene regulatory network (GRN) from gene expression data is a challenging but important undertaking for gaining a better understanding of gene regulation. Key challenges include working with noisy data and dealing with a higher number of genes than samples. Although a numb...

Descripción completa

Detalles Bibliográficos
Autores principales: Aziz, Furqan, Acharjee, Animesh, Williams, John A., Russ, Dominic, Bravo-Merodio, Laura, Gkoutos, Georgios V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7660606/
https://www.ncbi.nlm.nih.gov/pubmed/33114263
http://dx.doi.org/10.3390/ijms21217886
_version_ 1783609040529522688
author Aziz, Furqan
Acharjee, Animesh
Williams, John A.
Russ, Dominic
Bravo-Merodio, Laura
Gkoutos, Georgios V.
author_facet Aziz, Furqan
Acharjee, Animesh
Williams, John A.
Russ, Dominic
Bravo-Merodio, Laura
Gkoutos, Georgios V.
author_sort Aziz, Furqan
collection PubMed
description Inferring the topology of a gene regulatory network (GRN) from gene expression data is a challenging but important undertaking for gaining a better understanding of gene regulation. Key challenges include working with noisy data and dealing with a higher number of genes than samples. Although a number of different methods have been proposed to infer the structure of a GRN, there are large discrepancies among the different inference algorithms they adopt, rendering their meaningful comparison challenging. In this study, we used two methods, namely the MIDER (Mutual Information Distance and Entropy Reduction) and the PLSNET (Partial least square based feature selection) methods, to infer the structure of a GRN directly from data and computationally validated our results. Both methods were applied to different gene expression datasets resulting from inflammatory bowel disease (IBD), pancreatic ductal adenocarcinoma (PDAC), and acute myeloid leukaemia (AML) studies. For each case, gene regulators were successfully identified. For example, for the case of the IBD dataset, the UGT1A family genes were identified as key regulators while upon analysing the PDAC dataset, the SULF1 and THBS2 genes were depicted. We further demonstrate that an ensemble-based approach, that combines the output of the MIDER and PLSNET algorithms, can infer the structure of a GRN from data with higher accuracy. We have also estimated the number of the samples required for potential future validation studies. Here, we presented our proposed analysis framework that caters not only to candidate regulator genes prediction for potential validation experiments but also an estimation of the number of samples required for these experiments.
format Online
Article
Text
id pubmed-7660606
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-76606062020-11-13 Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference Aziz, Furqan Acharjee, Animesh Williams, John A. Russ, Dominic Bravo-Merodio, Laura Gkoutos, Georgios V. Int J Mol Sci Article Inferring the topology of a gene regulatory network (GRN) from gene expression data is a challenging but important undertaking for gaining a better understanding of gene regulation. Key challenges include working with noisy data and dealing with a higher number of genes than samples. Although a number of different methods have been proposed to infer the structure of a GRN, there are large discrepancies among the different inference algorithms they adopt, rendering their meaningful comparison challenging. In this study, we used two methods, namely the MIDER (Mutual Information Distance and Entropy Reduction) and the PLSNET (Partial least square based feature selection) methods, to infer the structure of a GRN directly from data and computationally validated our results. Both methods were applied to different gene expression datasets resulting from inflammatory bowel disease (IBD), pancreatic ductal adenocarcinoma (PDAC), and acute myeloid leukaemia (AML) studies. For each case, gene regulators were successfully identified. For example, for the case of the IBD dataset, the UGT1A family genes were identified as key regulators while upon analysing the PDAC dataset, the SULF1 and THBS2 genes were depicted. We further demonstrate that an ensemble-based approach, that combines the output of the MIDER and PLSNET algorithms, can infer the structure of a GRN from data with higher accuracy. We have also estimated the number of the samples required for potential future validation studies. Here, we presented our proposed analysis framework that caters not only to candidate regulator genes prediction for potential validation experiments but also an estimation of the number of samples required for these experiments. MDPI 2020-10-23 /pmc/articles/PMC7660606/ /pubmed/33114263 http://dx.doi.org/10.3390/ijms21217886 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Aziz, Furqan
Acharjee, Animesh
Williams, John A.
Russ, Dominic
Bravo-Merodio, Laura
Gkoutos, Georgios V.
Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference
title Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference
title_full Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference
title_fullStr Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference
title_full_unstemmed Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference
title_short Biomarker Prioritisation and Power Estimation Using Ensemble Gene Regulatory Network Inference
title_sort biomarker prioritisation and power estimation using ensemble gene regulatory network inference
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7660606/
https://www.ncbi.nlm.nih.gov/pubmed/33114263
http://dx.doi.org/10.3390/ijms21217886
work_keys_str_mv AT azizfurqan biomarkerprioritisationandpowerestimationusingensemblegeneregulatorynetworkinference
AT acharjeeanimesh biomarkerprioritisationandpowerestimationusingensemblegeneregulatorynetworkinference
AT williamsjohna biomarkerprioritisationandpowerestimationusingensemblegeneregulatorynetworkinference
AT russdominic biomarkerprioritisationandpowerestimationusingensemblegeneregulatorynetworkinference
AT bravomerodiolaura biomarkerprioritisationandpowerestimationusingensemblegeneregulatorynetworkinference
AT gkoutosgeorgiosv biomarkerprioritisationandpowerestimationusingensemblegeneregulatorynetworkinference