Cargando…
Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component Analysis
The mining of weak correlation information between two data matrices with high complexity is a very challenging task. A new method named principal component analysis-based multiconfidence ellipse analysis (PCA/MCEA) was proposed in this study, which first applied a confidence ellipse to describe the...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7843181/ https://www.ncbi.nlm.nih.gov/pubmed/33542846 http://dx.doi.org/10.1155/2021/8874827 |
_version_ | 1783644095414009856 |
---|---|
author | Pang, Tao Zhang, Haitao Wen, Liliang Tang, Jun Zhou, Bing Yang, Qianxu Li, Yong Wang, Jiajun Chen, Aiming Zeng, Zhongda |
author_facet | Pang, Tao Zhang, Haitao Wen, Liliang Tang, Jun Zhou, Bing Yang, Qianxu Li, Yong Wang, Jiajun Chen, Aiming Zeng, Zhongda |
author_sort | Pang, Tao |
collection | PubMed |
description | The mining of weak correlation information between two data matrices with high complexity is a very challenging task. A new method named principal component analysis-based multiconfidence ellipse analysis (PCA/MCEA) was proposed in this study, which first applied a confidence ellipse to describe the difference and correlation of such information among different categories of objects/samples on the basis of PCA operation of a single targeted data. This helps to find the number of objects contained in the overlapping and nonoverlapping areas of ellipses obtained from PCA runs. Then, a quantitative evaluation index of correlation between data matrices was defined by comparing the PCA results of more than one data matrix. The similarity and difference between data matrices was further quantified through comprehensively analyzing the outcomes. Complicated data of tobacco agriculture were used as an example to illustrate the strategy of the proposed method, which includes rich features of climate, altitude, and chemical compositions of tobacco leaves. The number of objects of these data reached 171,516 with 14, 4, and 5 descriptors of climate, altitude, and chemicals, respectively. On the basis of the new method, the complex but weak relationship between these independent and dependent variables were interestingly studied. Three widely used but conventional methods were applied for comparison in this work. The results showed the power of the new method to discover the weak correlation between complicated data. |
format | Online Article Text |
id | pubmed-7843181 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-78431812021-02-03 Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component Analysis Pang, Tao Zhang, Haitao Wen, Liliang Tang, Jun Zhou, Bing Yang, Qianxu Li, Yong Wang, Jiajun Chen, Aiming Zeng, Zhongda J Anal Methods Chem Research Article The mining of weak correlation information between two data matrices with high complexity is a very challenging task. A new method named principal component analysis-based multiconfidence ellipse analysis (PCA/MCEA) was proposed in this study, which first applied a confidence ellipse to describe the difference and correlation of such information among different categories of objects/samples on the basis of PCA operation of a single targeted data. This helps to find the number of objects contained in the overlapping and nonoverlapping areas of ellipses obtained from PCA runs. Then, a quantitative evaluation index of correlation between data matrices was defined by comparing the PCA results of more than one data matrix. The similarity and difference between data matrices was further quantified through comprehensively analyzing the outcomes. Complicated data of tobacco agriculture were used as an example to illustrate the strategy of the proposed method, which includes rich features of climate, altitude, and chemical compositions of tobacco leaves. The number of objects of these data reached 171,516 with 14, 4, and 5 descriptors of climate, altitude, and chemicals, respectively. On the basis of the new method, the complex but weak relationship between these independent and dependent variables were interestingly studied. Three widely used but conventional methods were applied for comparison in this work. The results showed the power of the new method to discover the weak correlation between complicated data. Hindawi 2021-01-20 /pmc/articles/PMC7843181/ /pubmed/33542846 http://dx.doi.org/10.1155/2021/8874827 Text en Copyright © 2021 Tao Pang et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Pang, Tao Zhang, Haitao Wen, Liliang Tang, Jun Zhou, Bing Yang, Qianxu Li, Yong Wang, Jiajun Chen, Aiming Zeng, Zhongda Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component Analysis |
title | Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component Analysis |
title_full | Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component Analysis |
title_fullStr | Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component Analysis |
title_full_unstemmed | Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component Analysis |
title_short | Quantitative Analysis of a Weak Correlation between Complicated Data on the Basis of Principal Component Analysis |
title_sort | quantitative analysis of a weak correlation between complicated data on the basis of principal component analysis |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7843181/ https://www.ncbi.nlm.nih.gov/pubmed/33542846 http://dx.doi.org/10.1155/2021/8874827 |
work_keys_str_mv | AT pangtao quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis AT zhanghaitao quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis AT wenliliang quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis AT tangjun quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis AT zhoubing quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis AT yangqianxu quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis AT liyong quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis AT wangjiajun quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis AT chenaiming quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis AT zengzhongda quantitativeanalysisofaweakcorrelationbetweencomplicateddataonthebasisofprincipalcomponentanalysis |