Cargando…
High-Order Correlation Integration for Single-Cell or Bulk RNA-seq Data Analysis
Quantifying or labeling the sample type with high quality is a challenging task, which is a key step for understanding complex diseases. Reducing noise pollution to data and ensuring the extracted intrinsic patterns in concordance with the primary data structure are important in sample clustering an...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6497731/ https://www.ncbi.nlm.nih.gov/pubmed/31080457 http://dx.doi.org/10.3389/fgene.2019.00371 |
_version_ | 1783415518731960320 |
---|---|
author | Tang, Hui Zeng, Tao Chen, Luonan |
author_facet | Tang, Hui Zeng, Tao Chen, Luonan |
author_sort | Tang, Hui |
collection | PubMed |
description | Quantifying or labeling the sample type with high quality is a challenging task, which is a key step for understanding complex diseases. Reducing noise pollution to data and ensuring the extracted intrinsic patterns in concordance with the primary data structure are important in sample clustering and classification. Here we propose an effective data integration framework named as HCI (High-order Correlation Integration), which takes an advantage of high-order correlation matrix incorporated with pattern fusion analysis (PFA), to realize high-dimensional data feature extraction. On the one hand, the high-order Pearson's correlation coefficient can highlight the latent patterns underlying noisy input datasets and thus improve the accuracy and robustness of the algorithms currently available for sample clustering. On the other hand, the PFA can identify intrinsic sample patterns efficiently from different input matrices by optimally adjusting the signal effects. To validate the effectiveness of our new method, we firstly applied HCI on four single-cell RNA-seq datasets to distinguish the cell types, and we found that HCI is capable of identifying the prior-known cell types of single-cell samples from scRNA-seq data with higher accuracy and robustness than other methods under different conditions. Secondly, we also integrated heterogonous omics data from TCGA datasets and GEO datasets including bulk RNA-seq data, which outperformed the other methods at identifying distinct cancer subtypes. Within an additional case study, we also constructed the mRNA-miRNA regulatory network of colorectal cancer based on the feature weight estimated from HCI, where the differentially expressed mRNAs and miRNAs were significantly enriched in well-known functional sets of colorectal cancer, such as KEGG pathways and IPA disease annotations. All these results supported that HCI has extensive flexibility and applicability on sample clustering with different types and organizations of RNA-seq data. |
format | Online Article Text |
id | pubmed-6497731 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-64977312019-05-10 High-Order Correlation Integration for Single-Cell or Bulk RNA-seq Data Analysis Tang, Hui Zeng, Tao Chen, Luonan Front Genet Genetics Quantifying or labeling the sample type with high quality is a challenging task, which is a key step for understanding complex diseases. Reducing noise pollution to data and ensuring the extracted intrinsic patterns in concordance with the primary data structure are important in sample clustering and classification. Here we propose an effective data integration framework named as HCI (High-order Correlation Integration), which takes an advantage of high-order correlation matrix incorporated with pattern fusion analysis (PFA), to realize high-dimensional data feature extraction. On the one hand, the high-order Pearson's correlation coefficient can highlight the latent patterns underlying noisy input datasets and thus improve the accuracy and robustness of the algorithms currently available for sample clustering. On the other hand, the PFA can identify intrinsic sample patterns efficiently from different input matrices by optimally adjusting the signal effects. To validate the effectiveness of our new method, we firstly applied HCI on four single-cell RNA-seq datasets to distinguish the cell types, and we found that HCI is capable of identifying the prior-known cell types of single-cell samples from scRNA-seq data with higher accuracy and robustness than other methods under different conditions. Secondly, we also integrated heterogonous omics data from TCGA datasets and GEO datasets including bulk RNA-seq data, which outperformed the other methods at identifying distinct cancer subtypes. Within an additional case study, we also constructed the mRNA-miRNA regulatory network of colorectal cancer based on the feature weight estimated from HCI, where the differentially expressed mRNAs and miRNAs were significantly enriched in well-known functional sets of colorectal cancer, such as KEGG pathways and IPA disease annotations. All these results supported that HCI has extensive flexibility and applicability on sample clustering with different types and organizations of RNA-seq data. Frontiers Media S.A. 2019-04-26 /pmc/articles/PMC6497731/ /pubmed/31080457 http://dx.doi.org/10.3389/fgene.2019.00371 Text en Copyright © 2019 Tang, Zeng and Chen. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Tang, Hui Zeng, Tao Chen, Luonan High-Order Correlation Integration for Single-Cell or Bulk RNA-seq Data Analysis |
title | High-Order Correlation Integration for Single-Cell or Bulk RNA-seq Data Analysis |
title_full | High-Order Correlation Integration for Single-Cell or Bulk RNA-seq Data Analysis |
title_fullStr | High-Order Correlation Integration for Single-Cell or Bulk RNA-seq Data Analysis |
title_full_unstemmed | High-Order Correlation Integration for Single-Cell or Bulk RNA-seq Data Analysis |
title_short | High-Order Correlation Integration for Single-Cell or Bulk RNA-seq Data Analysis |
title_sort | high-order correlation integration for single-cell or bulk rna-seq data analysis |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6497731/ https://www.ncbi.nlm.nih.gov/pubmed/31080457 http://dx.doi.org/10.3389/fgene.2019.00371 |
work_keys_str_mv | AT tanghui highordercorrelationintegrationforsinglecellorbulkrnaseqdataanalysis AT zengtao highordercorrelationintegrationforsinglecellorbulkrnaseqdataanalysis AT chenluonan highordercorrelationintegrationforsinglecellorbulkrnaseqdataanalysis |