Cargando…
A nonlinear correlation measure with applications to gene expression data
Nonlinear correlation exists in many types of biomedical data. Several types of pairwise gene expression in humans and other organisms show nonlinear correlation across time, e.g., genes involved in human T helper (Th17) cells differentiation, which motivated this study. The proposed procedure, call...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9212159/ https://www.ncbi.nlm.nih.gov/pubmed/35727808 http://dx.doi.org/10.1371/journal.pone.0270270 |
_version_ | 1784730518255828992 |
---|---|
author | Tripathi, Yogesh M. Chatla, Suneel Babu Chang, Yuan-Chin I. Huang, Li-Shan Shieh, Grace S. |
author_facet | Tripathi, Yogesh M. Chatla, Suneel Babu Chang, Yuan-Chin I. Huang, Li-Shan Shieh, Grace S. |
author_sort | Tripathi, Yogesh M. |
collection | PubMed |
description | Nonlinear correlation exists in many types of biomedical data. Several types of pairwise gene expression in humans and other organisms show nonlinear correlation across time, e.g., genes involved in human T helper (Th17) cells differentiation, which motivated this study. The proposed procedure, called Kernelized correlation (K(c)), first transforms nonlinear data on the plane via a function (kernel, usually nonlinear) to a high-dimensional (Hilbert) space. Next, we plug the transformed data into a classical correlation coefficient, e.g., Pearson’s correlation coefficient (r), to yield a nonlinear correlation measure. The algorithm to compute K(c) is developed and the R code is provided online. In three simulated nonlinear cases, when noise in data is moderate, K(c) with the RBF kernel (K(c)-RBF) outperforms Pearson’s r and the well-known distance correlation (dCor). However, when noise in data is low, Pearson’s r and dCor perform slightly better than (equivalently to) K(c)-RBF in Case 1 and 3 (in Case 2); Kendall’s tau performs worse than the aforementioned measures in all cases. In Application 1 to discover genes involved in the early Th17 cell differentiation, K(c) is shown to detect the nonlinear correlations of four genes with IL17A (a known marker gene), while dCor detects nonlinear correlations of two pairs, and DESeq fails in all these pairs. Next, K(c) outperforms Pearson’s and dCor, in estimating the nonlinear correlation of negatively correlated gene pairs in yeast cell cycle regulation. In conclusion, K(c) is a simple and competent procedure to measure pairwise nonlinear correlations. |
format | Online Article Text |
id | pubmed-9212159 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-92121592022-06-22 A nonlinear correlation measure with applications to gene expression data Tripathi, Yogesh M. Chatla, Suneel Babu Chang, Yuan-Chin I. Huang, Li-Shan Shieh, Grace S. PLoS One Research Article Nonlinear correlation exists in many types of biomedical data. Several types of pairwise gene expression in humans and other organisms show nonlinear correlation across time, e.g., genes involved in human T helper (Th17) cells differentiation, which motivated this study. The proposed procedure, called Kernelized correlation (K(c)), first transforms nonlinear data on the plane via a function (kernel, usually nonlinear) to a high-dimensional (Hilbert) space. Next, we plug the transformed data into a classical correlation coefficient, e.g., Pearson’s correlation coefficient (r), to yield a nonlinear correlation measure. The algorithm to compute K(c) is developed and the R code is provided online. In three simulated nonlinear cases, when noise in data is moderate, K(c) with the RBF kernel (K(c)-RBF) outperforms Pearson’s r and the well-known distance correlation (dCor). However, when noise in data is low, Pearson’s r and dCor perform slightly better than (equivalently to) K(c)-RBF in Case 1 and 3 (in Case 2); Kendall’s tau performs worse than the aforementioned measures in all cases. In Application 1 to discover genes involved in the early Th17 cell differentiation, K(c) is shown to detect the nonlinear correlations of four genes with IL17A (a known marker gene), while dCor detects nonlinear correlations of two pairs, and DESeq fails in all these pairs. Next, K(c) outperforms Pearson’s and dCor, in estimating the nonlinear correlation of negatively correlated gene pairs in yeast cell cycle regulation. In conclusion, K(c) is a simple and competent procedure to measure pairwise nonlinear correlations. Public Library of Science 2022-06-21 /pmc/articles/PMC9212159/ /pubmed/35727808 http://dx.doi.org/10.1371/journal.pone.0270270 Text en © 2022 Tripathi et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Tripathi, Yogesh M. Chatla, Suneel Babu Chang, Yuan-Chin I. Huang, Li-Shan Shieh, Grace S. A nonlinear correlation measure with applications to gene expression data |
title | A nonlinear correlation measure with applications to gene expression data |
title_full | A nonlinear correlation measure with applications to gene expression data |
title_fullStr | A nonlinear correlation measure with applications to gene expression data |
title_full_unstemmed | A nonlinear correlation measure with applications to gene expression data |
title_short | A nonlinear correlation measure with applications to gene expression data |
title_sort | nonlinear correlation measure with applications to gene expression data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9212159/ https://www.ncbi.nlm.nih.gov/pubmed/35727808 http://dx.doi.org/10.1371/journal.pone.0270270 |
work_keys_str_mv | AT tripathiyogeshm anonlinearcorrelationmeasurewithapplicationstogeneexpressiondata AT chatlasuneelbabu anonlinearcorrelationmeasurewithapplicationstogeneexpressiondata AT changyuanchini anonlinearcorrelationmeasurewithapplicationstogeneexpressiondata AT huanglishan anonlinearcorrelationmeasurewithapplicationstogeneexpressiondata AT shiehgraces anonlinearcorrelationmeasurewithapplicationstogeneexpressiondata AT tripathiyogeshm nonlinearcorrelationmeasurewithapplicationstogeneexpressiondata AT chatlasuneelbabu nonlinearcorrelationmeasurewithapplicationstogeneexpressiondata AT changyuanchini nonlinearcorrelationmeasurewithapplicationstogeneexpressiondata AT huanglishan nonlinearcorrelationmeasurewithapplicationstogeneexpressiondata AT shiehgraces nonlinearcorrelationmeasurewithapplicationstogeneexpressiondata |