Cargando…

Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data

BACKGROUND: The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used...

Descripción completa

Detalles Bibliográficos
Autores principales: Daub, Carsten O, Steuer, Ralf, Selbig, Joachim, Kloska, Sebastian
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC516800/
https://www.ncbi.nlm.nih.gov/pubmed/15339346
http://dx.doi.org/10.1186/1471-2105-5-118
_version_ 1782121777209540608
author Daub, Carsten O
Steuer, Ralf
Selbig, Joachim
Kloska, Sebastian
author_facet Daub, Carsten O
Steuer, Ralf
Selbig, Joachim
Kloska, Sebastian
author_sort Daub, Carsten O
collection PubMed
description BACKGROUND: The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. RESULTS: In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures. A C++ source code of our algorithm is available for non-commercial use from kloska@scienion.de upon request. CONCLUSION: The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended.
format Text
id pubmed-516800
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5168002004-09-14 Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data Daub, Carsten O Steuer, Ralf Selbig, Joachim Kloska, Sebastian BMC Bioinformatics Methodology Article BACKGROUND: The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. RESULTS: In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures. A C++ source code of our algorithm is available for non-commercial use from kloska@scienion.de upon request. CONCLUSION: The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended. BioMed Central 2004-08-31 /pmc/articles/PMC516800/ /pubmed/15339346 http://dx.doi.org/10.1186/1471-2105-5-118 Text en Copyright © 2004 Daub et al; licensee BioMed Central Ltd.
spellingShingle Methodology Article
Daub, Carsten O
Steuer, Ralf
Selbig, Joachim
Kloska, Sebastian
Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data
title Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data
title_full Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data
title_fullStr Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data
title_full_unstemmed Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data
title_short Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data
title_sort estimating mutual information using b-spline functions – an improved similarity measure for analysing gene expression data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC516800/
https://www.ncbi.nlm.nih.gov/pubmed/15339346
http://dx.doi.org/10.1186/1471-2105-5-118
work_keys_str_mv AT daubcarsteno estimatingmutualinformationusingbsplinefunctionsanimprovedsimilaritymeasureforanalysinggeneexpressiondata
AT steuerralf estimatingmutualinformationusingbsplinefunctionsanimprovedsimilaritymeasureforanalysinggeneexpressiondata
AT selbigjoachim estimatingmutualinformationusingbsplinefunctionsanimprovedsimilaritymeasureforanalysinggeneexpressiondata
AT kloskasebastian estimatingmutualinformationusingbsplinefunctionsanimprovedsimilaritymeasureforanalysinggeneexpressiondata