Cargando…

A nonlinear correlation measure with applications to gene expression data

Nonlinear correlation exists in many types of biomedical data. Several types of pairwise gene expression in humans and other organisms show nonlinear correlation across time, e.g., genes involved in human T helper (Th17) cells differentiation, which motivated this study. The proposed procedure, call...

Descripción completa

Detalles Bibliográficos
Autores principales: Tripathi, Yogesh M., Chatla, Suneel Babu, Chang, Yuan-Chin I., Huang, Li-Shan, Shieh, Grace S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9212159/
https://www.ncbi.nlm.nih.gov/pubmed/35727808
http://dx.doi.org/10.1371/journal.pone.0270270
_version_ 1784730518255828992
author Tripathi, Yogesh M.
Chatla, Suneel Babu
Chang, Yuan-Chin I.
Huang, Li-Shan
Shieh, Grace S.
author_facet Tripathi, Yogesh M.
Chatla, Suneel Babu
Chang, Yuan-Chin I.
Huang, Li-Shan
Shieh, Grace S.
author_sort Tripathi, Yogesh M.
collection PubMed
description Nonlinear correlation exists in many types of biomedical data. Several types of pairwise gene expression in humans and other organisms show nonlinear correlation across time, e.g., genes involved in human T helper (Th17) cells differentiation, which motivated this study. The proposed procedure, called Kernelized correlation (K(c)), first transforms nonlinear data on the plane via a function (kernel, usually nonlinear) to a high-dimensional (Hilbert) space. Next, we plug the transformed data into a classical correlation coefficient, e.g., Pearson’s correlation coefficient (r), to yield a nonlinear correlation measure. The algorithm to compute K(c) is developed and the R code is provided online. In three simulated nonlinear cases, when noise in data is moderate, K(c) with the RBF kernel (K(c)-RBF) outperforms Pearson’s r and the well-known distance correlation (dCor). However, when noise in data is low, Pearson’s r and dCor perform slightly better than (equivalently to) K(c)-RBF in Case 1 and 3 (in Case 2); Kendall’s tau performs worse than the aforementioned measures in all cases. In Application 1 to discover genes involved in the early Th17 cell differentiation, K(c) is shown to detect the nonlinear correlations of four genes with IL17A (a known marker gene), while dCor detects nonlinear correlations of two pairs, and DESeq fails in all these pairs. Next, K(c) outperforms Pearson’s and dCor, in estimating the nonlinear correlation of negatively correlated gene pairs in yeast cell cycle regulation. In conclusion, K(c) is a simple and competent procedure to measure pairwise nonlinear correlations.
format Online
Article
Text
id pubmed-9212159
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-92121592022-06-22 A nonlinear correlation measure with applications to gene expression data Tripathi, Yogesh M. Chatla, Suneel Babu Chang, Yuan-Chin I. Huang, Li-Shan Shieh, Grace S. PLoS One Research Article Nonlinear correlation exists in many types of biomedical data. Several types of pairwise gene expression in humans and other organisms show nonlinear correlation across time, e.g., genes involved in human T helper (Th17) cells differentiation, which motivated this study. The proposed procedure, called Kernelized correlation (K(c)), first transforms nonlinear data on the plane via a function (kernel, usually nonlinear) to a high-dimensional (Hilbert) space. Next, we plug the transformed data into a classical correlation coefficient, e.g., Pearson’s correlation coefficient (r), to yield a nonlinear correlation measure. The algorithm to compute K(c) is developed and the R code is provided online. In three simulated nonlinear cases, when noise in data is moderate, K(c) with the RBF kernel (K(c)-RBF) outperforms Pearson’s r and the well-known distance correlation (dCor). However, when noise in data is low, Pearson’s r and dCor perform slightly better than (equivalently to) K(c)-RBF in Case 1 and 3 (in Case 2); Kendall’s tau performs worse than the aforementioned measures in all cases. In Application 1 to discover genes involved in the early Th17 cell differentiation, K(c) is shown to detect the nonlinear correlations of four genes with IL17A (a known marker gene), while dCor detects nonlinear correlations of two pairs, and DESeq fails in all these pairs. Next, K(c) outperforms Pearson’s and dCor, in estimating the nonlinear correlation of negatively correlated gene pairs in yeast cell cycle regulation. In conclusion, K(c) is a simple and competent procedure to measure pairwise nonlinear correlations. Public Library of Science 2022-06-21 /pmc/articles/PMC9212159/ /pubmed/35727808 http://dx.doi.org/10.1371/journal.pone.0270270 Text en © 2022 Tripathi et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Tripathi, Yogesh M.
Chatla, Suneel Babu
Chang, Yuan-Chin I.
Huang, Li-Shan
Shieh, Grace S.
A nonlinear correlation measure with applications to gene expression data
title A nonlinear correlation measure with applications to gene expression data
title_full A nonlinear correlation measure with applications to gene expression data
title_fullStr A nonlinear correlation measure with applications to gene expression data
title_full_unstemmed A nonlinear correlation measure with applications to gene expression data
title_short A nonlinear correlation measure with applications to gene expression data
title_sort nonlinear correlation measure with applications to gene expression data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9212159/
https://www.ncbi.nlm.nih.gov/pubmed/35727808
http://dx.doi.org/10.1371/journal.pone.0270270
work_keys_str_mv AT tripathiyogeshm anonlinearcorrelationmeasurewithapplicationstogeneexpressiondata
AT chatlasuneelbabu anonlinearcorrelationmeasurewithapplicationstogeneexpressiondata
AT changyuanchini anonlinearcorrelationmeasurewithapplicationstogeneexpressiondata
AT huanglishan anonlinearcorrelationmeasurewithapplicationstogeneexpressiondata
AT shiehgraces anonlinearcorrelationmeasurewithapplicationstogeneexpressiondata
AT tripathiyogeshm nonlinearcorrelationmeasurewithapplicationstogeneexpressiondata
AT chatlasuneelbabu nonlinearcorrelationmeasurewithapplicationstogeneexpressiondata
AT changyuanchini nonlinearcorrelationmeasurewithapplicationstogeneexpressiondata
AT huanglishan nonlinearcorrelationmeasurewithapplicationstogeneexpressiondata
AT shiehgraces nonlinearcorrelationmeasurewithapplicationstogeneexpressiondata