Cargando…
Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data
Integrative, single-cell analyses may provide unprecedented insights into cellular and spatial diversity of the tumor microenvironment. The sparsity, noise, and high dimensionality of these data present unique challenges. Whilst approaches for integrating single-cell data are emerging and are far fr...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7324639/ https://www.ncbi.nlm.nih.gov/pubmed/32656082 http://dx.doi.org/10.3389/fonc.2020.00973 |
_version_ | 1783551980088590336 |
---|---|
author | Hsu, Lauren L. Culhane, Aedin C. |
author_facet | Hsu, Lauren L. Culhane, Aedin C. |
author_sort | Hsu, Lauren L. |
collection | PubMed |
description | Integrative, single-cell analyses may provide unprecedented insights into cellular and spatial diversity of the tumor microenvironment. The sparsity, noise, and high dimensionality of these data present unique challenges. Whilst approaches for integrating single-cell data are emerging and are far from being standardized, most data integration, cell clustering, cell trajectory, and analysis pipelines employ a dimension reduction step, frequently principal component analysis (PCA), a matrix factorization method that is relatively fast, and can easily scale to large datasets when used with sparse-matrix representations. In this review, we provide a guide to PCA and related methods. We describe the relationship between PCA and singular value decomposition, the difference between PCA of a correlation and covariance matrix, the impact of scaling, log-transforming, and standardization, and how to recognize a horseshoe or arch effect in a PCA. We describe canonical correlation analysis (CCA), a popular matrix factorization approach for the integration of single-cell data from different platforms or studies. We discuss alternatives to CCA and why additional preprocessing or weighting datasets within the joint decomposition should be considered. |
format | Online Article Text |
id | pubmed-7324639 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-73246392020-07-10 Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data Hsu, Lauren L. Culhane, Aedin C. Front Oncol Oncology Integrative, single-cell analyses may provide unprecedented insights into cellular and spatial diversity of the tumor microenvironment. The sparsity, noise, and high dimensionality of these data present unique challenges. Whilst approaches for integrating single-cell data are emerging and are far from being standardized, most data integration, cell clustering, cell trajectory, and analysis pipelines employ a dimension reduction step, frequently principal component analysis (PCA), a matrix factorization method that is relatively fast, and can easily scale to large datasets when used with sparse-matrix representations. In this review, we provide a guide to PCA and related methods. We describe the relationship between PCA and singular value decomposition, the difference between PCA of a correlation and covariance matrix, the impact of scaling, log-transforming, and standardization, and how to recognize a horseshoe or arch effect in a PCA. We describe canonical correlation analysis (CCA), a popular matrix factorization approach for the integration of single-cell data from different platforms or studies. We discuss alternatives to CCA and why additional preprocessing or weighting datasets within the joint decomposition should be considered. Frontiers Media S.A. 2020-06-23 /pmc/articles/PMC7324639/ /pubmed/32656082 http://dx.doi.org/10.3389/fonc.2020.00973 Text en Copyright © 2020 Hsu and Culhane. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Oncology Hsu, Lauren L. Culhane, Aedin C. Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data |
title | Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data |
title_full | Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data |
title_fullStr | Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data |
title_full_unstemmed | Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data |
title_short | Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data |
title_sort | impact of data preprocessing on integrative matrix factorization of single cell data |
topic | Oncology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7324639/ https://www.ncbi.nlm.nih.gov/pubmed/32656082 http://dx.doi.org/10.3389/fonc.2020.00973 |
work_keys_str_mv | AT hsulaurenl impactofdatapreprocessingonintegrativematrixfactorizationofsinglecelldata AT culhaneaedinc impactofdatapreprocessingonintegrativematrixfactorizationofsinglecelldata |