Cargando…

Comparison of co-expression measures: mutual information, correlation, and model based indices

BACKGROUND: Co-expression measures are often used to define networks among genes. Mutual information (MI) is often used as a generalized correlation measure. It is not clear how much MI adds beyond standard (robust) correlation measures or regression model based association measures. Further, it is...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Lin, Langfelder, Peter, Horvath, Steve
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3586947/
https://www.ncbi.nlm.nih.gov/pubmed/23217028
http://dx.doi.org/10.1186/1471-2105-13-328
_version_ 1782261377802436608
author Song, Lin
Langfelder, Peter
Horvath, Steve
author_facet Song, Lin
Langfelder, Peter
Horvath, Steve
author_sort Song, Lin
collection PubMed
description BACKGROUND: Co-expression measures are often used to define networks among genes. Mutual information (MI) is often used as a generalized correlation measure. It is not clear how much MI adds beyond standard (robust) correlation measures or regression model based association measures. Further, it is important to assess what transformations of these and other co-expression measures lead to biologically meaningful modules (clusters of genes). RESULTS: We provide a comprehensive comparison between mutual information and several correlation measures in 8 empirical data sets and in simulations. We also study different approaches for transforming an adjacency matrix, e.g. using the topological overlap measure. Overall, we confirm close relationships between MI and correlation in all data sets which reflects the fact that most gene pairs satisfy linear or monotonic relationships. We discuss rare situations when the two measures disagree. We also compare correlation and MI based approaches when it comes to defining co-expression network modules. We show that a robust measure of correlation (the biweight midcorrelation transformed via the topological overlap transformation) leads to modules that are superior to MI based modules and maximal information coefficient (MIC) based modules in terms of gene ontology enrichment. We present a function that relates correlation to mutual information which can be used to approximate the mutual information from the corresponding correlation coefficient. We propose the use of polynomial or spline regression models as an alternative to MI for capturing non-linear relationships between quantitative variables. CONCLUSION: The biweight midcorrelation outperforms MI in terms of elucidating gene pairwise relationships. Coupled with the topological overlap matrix transformation, it often leads to more significantly enriched co-expression modules. Spline and polynomial networks form attractive alternatives to MI in case of non-linear relationships. Our results indicate that MI networks can safely be replaced by correlation networks when it comes to measuring co-expression relationships in stationary data.
format Online
Article
Text
id pubmed-3586947
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35869472013-03-08 Comparison of co-expression measures: mutual information, correlation, and model based indices Song, Lin Langfelder, Peter Horvath, Steve BMC Bioinformatics Research Article BACKGROUND: Co-expression measures are often used to define networks among genes. Mutual information (MI) is often used as a generalized correlation measure. It is not clear how much MI adds beyond standard (robust) correlation measures or regression model based association measures. Further, it is important to assess what transformations of these and other co-expression measures lead to biologically meaningful modules (clusters of genes). RESULTS: We provide a comprehensive comparison between mutual information and several correlation measures in 8 empirical data sets and in simulations. We also study different approaches for transforming an adjacency matrix, e.g. using the topological overlap measure. Overall, we confirm close relationships between MI and correlation in all data sets which reflects the fact that most gene pairs satisfy linear or monotonic relationships. We discuss rare situations when the two measures disagree. We also compare correlation and MI based approaches when it comes to defining co-expression network modules. We show that a robust measure of correlation (the biweight midcorrelation transformed via the topological overlap transformation) leads to modules that are superior to MI based modules and maximal information coefficient (MIC) based modules in terms of gene ontology enrichment. We present a function that relates correlation to mutual information which can be used to approximate the mutual information from the corresponding correlation coefficient. We propose the use of polynomial or spline regression models as an alternative to MI for capturing non-linear relationships between quantitative variables. CONCLUSION: The biweight midcorrelation outperforms MI in terms of elucidating gene pairwise relationships. Coupled with the topological overlap matrix transformation, it often leads to more significantly enriched co-expression modules. Spline and polynomial networks form attractive alternatives to MI in case of non-linear relationships. Our results indicate that MI networks can safely be replaced by correlation networks when it comes to measuring co-expression relationships in stationary data. BioMed Central 2012-12-09 /pmc/articles/PMC3586947/ /pubmed/23217028 http://dx.doi.org/10.1186/1471-2105-13-328 Text en Copyright ©2012 Song et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Song, Lin
Langfelder, Peter
Horvath, Steve
Comparison of co-expression measures: mutual information, correlation, and model based indices
title Comparison of co-expression measures: mutual information, correlation, and model based indices
title_full Comparison of co-expression measures: mutual information, correlation, and model based indices
title_fullStr Comparison of co-expression measures: mutual information, correlation, and model based indices
title_full_unstemmed Comparison of co-expression measures: mutual information, correlation, and model based indices
title_short Comparison of co-expression measures: mutual information, correlation, and model based indices
title_sort comparison of co-expression measures: mutual information, correlation, and model based indices
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3586947/
https://www.ncbi.nlm.nih.gov/pubmed/23217028
http://dx.doi.org/10.1186/1471-2105-13-328
work_keys_str_mv AT songlin comparisonofcoexpressionmeasuresmutualinformationcorrelationandmodelbasedindices
AT langfelderpeter comparisonofcoexpressionmeasuresmutualinformationcorrelationandmodelbasedindices
AT horvathsteve comparisonofcoexpressionmeasuresmutualinformationcorrelationandmodelbasedindices