Cargando…

Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study

BACKGROUND: In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks m...

Descripción completa

Detalles Bibliográficos
Autores principales:	Feltus, F Alex, Ficklin, Stephen P, Gibson, Scott M, Smith, Melissa C
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679940/ https://www.ncbi.nlm.nih.gov/pubmed/23738693 http://dx.doi.org/10.1186/1752-0509-7-44

_version_	1782273041302028288
author	Feltus, F Alex Ficklin, Stephen P Gibson, Scott M Smith, Melissa C
author_facet	Feltus, F Alex Ficklin, Stephen P Gibson, Scott M Smith, Melissa C
author_sort	Feltus, F Alex
collection	PubMed
description	BACKGROUND: In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium. RESULTS: A total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network. CONCLUSIONS: Here we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired.
format	Online Article Text
id	pubmed-3679940
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-36799402013-06-13 Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study Feltus, F Alex Ficklin, Stephen P Gibson, Scott M Smith, Melissa C BMC Syst Biol Methodology Article BACKGROUND: In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium. RESULTS: A total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network. CONCLUSIONS: Here we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired. BioMed Central 2013-06-05 /pmc/articles/PMC3679940/ /pubmed/23738693 http://dx.doi.org/10.1186/1752-0509-7-44 Text en Copyright © 2013 Feltus et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Feltus, F Alex Ficklin, Stephen P Gibson, Scott M Smith, Melissa C Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study
title	Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study
title_full	Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study
title_fullStr	Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study
title_full_unstemmed	Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study
title_short	Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study
title_sort	maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an arabidopsis case study
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679940/ https://www.ncbi.nlm.nih.gov/pubmed/23738693 http://dx.doi.org/10.1186/1752-0509-7-44
work_keys_str_mv	AT feltusfalex maximizingcaptureofgenecoexpressionrelationshipsthroughpreclusteringofinputexpressionsamplesanarabidopsiscasestudy AT ficklinstephenp maximizingcaptureofgenecoexpressionrelationshipsthroughpreclusteringofinputexpressionsamplesanarabidopsiscasestudy AT gibsonscottm maximizingcaptureofgenecoexpressionrelationshipsthroughpreclusteringofinputexpressionsamplesanarabidopsiscasestudy AT smithmelissac maximizingcaptureofgenecoexpressionrelationshipsthroughpreclusteringofinputexpressionsamplesanarabidopsiscasestudy

Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study

Ejemplares similares