Cargando…

Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study

Gene co-expression networks (GCNs) can be prepared using a variety of mathematical approaches based on data sampled across diverse developmental processes, tissue types, pathologies, mutant backgrounds, and stress conditions. These networks are used to identify genes with similar expression dynamics...

Descripción completa

Detalles Bibliográficos
Autores principales: Law, Simon R., Kellgren, Therese G., Björk, Rafael, Ryden, Patrik, Keech, Olivier
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7287149/
https://www.ncbi.nlm.nih.gov/pubmed/32582224
http://dx.doi.org/10.3389/fpls.2020.00524
_version_ 1783545008358424576
author Law, Simon R.
Kellgren, Therese G.
Björk, Rafael
Ryden, Patrik
Keech, Olivier
author_facet Law, Simon R.
Kellgren, Therese G.
Björk, Rafael
Ryden, Patrik
Keech, Olivier
author_sort Law, Simon R.
collection PubMed
description Gene co-expression networks (GCNs) can be prepared using a variety of mathematical approaches based on data sampled across diverse developmental processes, tissue types, pathologies, mutant backgrounds, and stress conditions. These networks are used to identify genes with similar expression dynamics but are prone to introducing false-positive and false-negative relationships, especially in the instance of large and heterogenous datasets. With the aim of optimizing the relevance of edges in GCNs and enhancing global biological insight, we propose a novel approach that involves a data-centering step performed simultaneously per gene and per sub-experiment, called centralization within sub-experiments (CSE). Using a gene set encoding the plant mitochondrial proteome as a case study, our results show that all CSE-based GCNs assessed had significantly more edges within the majority of the considered functional sub-networks, such as the mitochondrial electron transport chain and its complexes, than GCNs not using CSE; thus demonstrating that CSE-based GCNs are efficient at predicting canonical functions and associated pathways, here referred to as the core gene network. Furthermore, we show that correlation analyses using CSE-processed data can be used to fine-tune prediction of the function of uncharacterized genes; while its use in combination with analyses based on non-CSE data can augment conventional stress analyses with the innate connections underpinning the dynamic system being examined. Therefore, CSE is an effective alternative method to conventional batch correction approaches, particularly when dealing with large and heterogenous datasets. The method is easy to implement into a pre-existing GCN analysis pipeline and can provide enhanced biological relevance to conventional GCNs by allowing users to delineate a core gene network. AUTHOR SUMMARY: Gene co-expression networks (GCNs) are the product of a variety of mathematical approaches that identify causal relationships in gene expression dynamics but are prone to the misdiagnoses of false-positives and false-negatives, especially in the instance of large and heterogenous datasets. In light of the burgeoning output of next-generation sequencing projects performed on a variety of species, and developmental or clinical conditions; the statistical power and complexity of these networks will undoubtedly increase, while their biological relevance will be fiercely challenged. Here, we propose a novel approach to generate a “core” GCN with enhanced biological relevance. Our method involves a data-centering step that effectively removes all primary treatment/tissue effects, which is simple to employ and can be easily implemented into pre-existing GCN analysis pipelines. The gain in biological relevance resulting from the adoption of this approach was assessed using a plant mitochondrial case study.
format Online
Article
Text
id pubmed-7287149
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-72871492020-06-23 Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study Law, Simon R. Kellgren, Therese G. Björk, Rafael Ryden, Patrik Keech, Olivier Front Plant Sci Plant Science Gene co-expression networks (GCNs) can be prepared using a variety of mathematical approaches based on data sampled across diverse developmental processes, tissue types, pathologies, mutant backgrounds, and stress conditions. These networks are used to identify genes with similar expression dynamics but are prone to introducing false-positive and false-negative relationships, especially in the instance of large and heterogenous datasets. With the aim of optimizing the relevance of edges in GCNs and enhancing global biological insight, we propose a novel approach that involves a data-centering step performed simultaneously per gene and per sub-experiment, called centralization within sub-experiments (CSE). Using a gene set encoding the plant mitochondrial proteome as a case study, our results show that all CSE-based GCNs assessed had significantly more edges within the majority of the considered functional sub-networks, such as the mitochondrial electron transport chain and its complexes, than GCNs not using CSE; thus demonstrating that CSE-based GCNs are efficient at predicting canonical functions and associated pathways, here referred to as the core gene network. Furthermore, we show that correlation analyses using CSE-processed data can be used to fine-tune prediction of the function of uncharacterized genes; while its use in combination with analyses based on non-CSE data can augment conventional stress analyses with the innate connections underpinning the dynamic system being examined. Therefore, CSE is an effective alternative method to conventional batch correction approaches, particularly when dealing with large and heterogenous datasets. The method is easy to implement into a pre-existing GCN analysis pipeline and can provide enhanced biological relevance to conventional GCNs by allowing users to delineate a core gene network. AUTHOR SUMMARY: Gene co-expression networks (GCNs) are the product of a variety of mathematical approaches that identify causal relationships in gene expression dynamics but are prone to the misdiagnoses of false-positives and false-negatives, especially in the instance of large and heterogenous datasets. In light of the burgeoning output of next-generation sequencing projects performed on a variety of species, and developmental or clinical conditions; the statistical power and complexity of these networks will undoubtedly increase, while their biological relevance will be fiercely challenged. Here, we propose a novel approach to generate a “core” GCN with enhanced biological relevance. Our method involves a data-centering step that effectively removes all primary treatment/tissue effects, which is simple to employ and can be easily implemented into pre-existing GCN analysis pipelines. The gain in biological relevance resulting from the adoption of this approach was assessed using a plant mitochondrial case study. Frontiers Media S.A. 2020-06-04 /pmc/articles/PMC7287149/ /pubmed/32582224 http://dx.doi.org/10.3389/fpls.2020.00524 Text en Copyright © 2020 Law, Kellgren, Björk, Ryden and Keech. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Plant Science
Law, Simon R.
Kellgren, Therese G.
Björk, Rafael
Ryden, Patrik
Keech, Olivier
Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study
title Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study
title_full Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study
title_fullStr Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study
title_full_unstemmed Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study
title_short Centralization Within Sub-Experiments Enhances the Biological Relevance of Gene Co-expression Networks: A Plant Mitochondrial Case Study
title_sort centralization within sub-experiments enhances the biological relevance of gene co-expression networks: a plant mitochondrial case study
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7287149/
https://www.ncbi.nlm.nih.gov/pubmed/32582224
http://dx.doi.org/10.3389/fpls.2020.00524
work_keys_str_mv AT lawsimonr centralizationwithinsubexperimentsenhancesthebiologicalrelevanceofgenecoexpressionnetworksaplantmitochondrialcasestudy
AT kellgrenthereseg centralizationwithinsubexperimentsenhancesthebiologicalrelevanceofgenecoexpressionnetworksaplantmitochondrialcasestudy
AT bjorkrafael centralizationwithinsubexperimentsenhancesthebiologicalrelevanceofgenecoexpressionnetworksaplantmitochondrialcasestudy
AT rydenpatrik centralizationwithinsubexperimentsenhancesthebiologicalrelevanceofgenecoexpressionnetworksaplantmitochondrialcasestudy
AT keecholivier centralizationwithinsubexperimentsenhancesthebiologicalrelevanceofgenecoexpressionnetworksaplantmitochondrialcasestudy