Cargando…

Construction and use of gene expression covariation matrix

BACKGROUND: One essential step in the massive analysis of transcriptomic profiles is the calculation of the correlation coefficient, a value used to select pairs of genes with similar or inverse transcriptional profiles across a large fraction of the biological conditions examined. Until now, the ch...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hennetin, Jérôme, Pehkonen, Petri, Bellis, Michel
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2720390/ https://www.ncbi.nlm.nih.gov/pubmed/19594909 http://dx.doi.org/10.1186/1471-2105-10-214

_version_	1782170131594477568
author	Hennetin, Jérôme Pehkonen, Petri Bellis, Michel
author_facet	Hennetin, Jérôme Pehkonen, Petri Bellis, Michel
author_sort	Hennetin, Jérôme
collection	PubMed
description	BACKGROUND: One essential step in the massive analysis of transcriptomic profiles is the calculation of the correlation coefficient, a value used to select pairs of genes with similar or inverse transcriptional profiles across a large fraction of the biological conditions examined. Until now, the choice between the two available methods for calculating the coefficient has been dictated mainly by technological considerations. Specifically, in analyses based on double-channel techniques, researchers have been required to use covariation correlation, i.e. the correlation between gene expression changes measured between several pairs of biological conditions, expressed for example as fold-change. In contrast, in analyses of single-channel techniques scientists have been restricted to the use of coexpression correlation, i.e. correlation between gene expression levels. To our knowledge, nobody has ever examined the possible benefits of using covariation instead of coexpression in massive analyses of single channel microarray results. RESULTS: We describe here how single-channel techniques can be treated like double-channel techniques and used to generate both gene expression changes and covariation measures. We also present a new method that allows the calculation of both positive and negative correlation coefficients between genes. First, we perform systematic comparisons between two given biological conditions and classify, for each comparison, genes as increased (I), decreased (D), or not changed (N). As a result, the original series of n gene expression level measures assigned to each gene is replaced by an ordered string of n(n-1)/2 symbols, e.g. IDDNNIDID....DNNNNNNID, with the length of the string corresponding to the number of comparisons. In a second step, positive and negative covariation matrices (CVM) are constructed by calculating statistically significant positive or negative correlation scores for any pair of genes by comparing their strings of symbols. CONCLUSION: This new method, applied to four different large data sets, has allowed us to construct distinct covariation matrices with similar properties. We have also developed a technique to translate these covariation networks into graphical 3D representations and found that the local assignation of the probe sets was conserved across the four chip set models used which encompass three different species (humans, mice, and rats). The application of adapted clustering methods succeeded in delineating six conserved functional regions that we characterized using Gene Ontology information.
format	Text
id	pubmed-2720390
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27203902009-08-04 Construction and use of gene expression covariation matrix Hennetin, Jérôme Pehkonen, Petri Bellis, Michel BMC Bioinformatics Methodology Article BACKGROUND: One essential step in the massive analysis of transcriptomic profiles is the calculation of the correlation coefficient, a value used to select pairs of genes with similar or inverse transcriptional profiles across a large fraction of the biological conditions examined. Until now, the choice between the two available methods for calculating the coefficient has been dictated mainly by technological considerations. Specifically, in analyses based on double-channel techniques, researchers have been required to use covariation correlation, i.e. the correlation between gene expression changes measured between several pairs of biological conditions, expressed for example as fold-change. In contrast, in analyses of single-channel techniques scientists have been restricted to the use of coexpression correlation, i.e. correlation between gene expression levels. To our knowledge, nobody has ever examined the possible benefits of using covariation instead of coexpression in massive analyses of single channel microarray results. RESULTS: We describe here how single-channel techniques can be treated like double-channel techniques and used to generate both gene expression changes and covariation measures. We also present a new method that allows the calculation of both positive and negative correlation coefficients between genes. First, we perform systematic comparisons between two given biological conditions and classify, for each comparison, genes as increased (I), decreased (D), or not changed (N). As a result, the original series of n gene expression level measures assigned to each gene is replaced by an ordered string of n(n-1)/2 symbols, e.g. IDDNNIDID....DNNNNNNID, with the length of the string corresponding to the number of comparisons. In a second step, positive and negative covariation matrices (CVM) are constructed by calculating statistically significant positive or negative correlation scores for any pair of genes by comparing their strings of symbols. CONCLUSION: This new method, applied to four different large data sets, has allowed us to construct distinct covariation matrices with similar properties. We have also developed a technique to translate these covariation networks into graphical 3D representations and found that the local assignation of the probe sets was conserved across the four chip set models used which encompass three different species (humans, mice, and rats). The application of adapted clustering methods succeeded in delineating six conserved functional regions that we characterized using Gene Ontology information. BioMed Central 2009-07-13 /pmc/articles/PMC2720390/ /pubmed/19594909 http://dx.doi.org/10.1186/1471-2105-10-214 Text en Copyright © 2009 Hennetin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Hennetin, Jérôme Pehkonen, Petri Bellis, Michel Construction and use of gene expression covariation matrix
title	Construction and use of gene expression covariation matrix
title_full	Construction and use of gene expression covariation matrix
title_fullStr	Construction and use of gene expression covariation matrix
title_full_unstemmed	Construction and use of gene expression covariation matrix
title_short	Construction and use of gene expression covariation matrix
title_sort	construction and use of gene expression covariation matrix
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2720390/ https://www.ncbi.nlm.nih.gov/pubmed/19594909 http://dx.doi.org/10.1186/1471-2105-10-214
work_keys_str_mv	AT hennetinjerome constructionanduseofgeneexpressioncovariationmatrix AT pehkonenpetri constructionanduseofgeneexpressioncovariationmatrix AT bellismichel constructionanduseofgeneexpressioncovariationmatrix

Construction and use of gene expression covariation matrix

Ejemplares similares