Cargando…

Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study

BACKGROUND: In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks m...

Descripción completa

Detalles Bibliográficos
Autores principales: Feltus, F Alex, Ficklin, Stephen P, Gibson, Scott M, Smith, Melissa C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679940/
https://www.ncbi.nlm.nih.gov/pubmed/23738693
http://dx.doi.org/10.1186/1752-0509-7-44
_version_ 1782273041302028288
author Feltus, F Alex
Ficklin, Stephen P
Gibson, Scott M
Smith, Melissa C
author_facet Feltus, F Alex
Ficklin, Stephen P
Gibson, Scott M
Smith, Melissa C
author_sort Feltus, F Alex
collection PubMed
description BACKGROUND: In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium. RESULTS: A total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network. CONCLUSIONS: Here we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired.
format Online
Article
Text
id pubmed-3679940
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36799402013-06-13 Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study Feltus, F Alex Ficklin, Stephen P Gibson, Scott M Smith, Melissa C BMC Syst Biol Methodology Article BACKGROUND: In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium. RESULTS: A total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network. CONCLUSIONS: Here we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired. BioMed Central 2013-06-05 /pmc/articles/PMC3679940/ /pubmed/23738693 http://dx.doi.org/10.1186/1752-0509-7-44 Text en Copyright © 2013 Feltus et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Feltus, F Alex
Ficklin, Stephen P
Gibson, Scott M
Smith, Melissa C
Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study
title Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study
title_full Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study
title_fullStr Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study
title_full_unstemmed Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study
title_short Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study
title_sort maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an arabidopsis case study
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679940/
https://www.ncbi.nlm.nih.gov/pubmed/23738693
http://dx.doi.org/10.1186/1752-0509-7-44
work_keys_str_mv AT feltusfalex maximizingcaptureofgenecoexpressionrelationshipsthroughpreclusteringofinputexpressionsamplesanarabidopsiscasestudy
AT ficklinstephenp maximizingcaptureofgenecoexpressionrelationshipsthroughpreclusteringofinputexpressionsamplesanarabidopsiscasestudy
AT gibsonscottm maximizingcaptureofgenecoexpressionrelationshipsthroughpreclusteringofinputexpressionsamplesanarabidopsiscasestudy
AT smithmelissac maximizingcaptureofgenecoexpressionrelationshipsthroughpreclusteringofinputexpressionsamplesanarabidopsiscasestudy