Cargando…

Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data

Integrative, single-cell analyses may provide unprecedented insights into cellular and spatial diversity of the tumor microenvironment. The sparsity, noise, and high dimensionality of these data present unique challenges. Whilst approaches for integrating single-cell data are emerging and are far fr...

Descripción completa

Detalles Bibliográficos
Autores principales: Hsu, Lauren L., Culhane, Aedin C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7324639/
https://www.ncbi.nlm.nih.gov/pubmed/32656082
http://dx.doi.org/10.3389/fonc.2020.00973
_version_ 1783551980088590336
author Hsu, Lauren L.
Culhane, Aedin C.
author_facet Hsu, Lauren L.
Culhane, Aedin C.
author_sort Hsu, Lauren L.
collection PubMed
description Integrative, single-cell analyses may provide unprecedented insights into cellular and spatial diversity of the tumor microenvironment. The sparsity, noise, and high dimensionality of these data present unique challenges. Whilst approaches for integrating single-cell data are emerging and are far from being standardized, most data integration, cell clustering, cell trajectory, and analysis pipelines employ a dimension reduction step, frequently principal component analysis (PCA), a matrix factorization method that is relatively fast, and can easily scale to large datasets when used with sparse-matrix representations. In this review, we provide a guide to PCA and related methods. We describe the relationship between PCA and singular value decomposition, the difference between PCA of a correlation and covariance matrix, the impact of scaling, log-transforming, and standardization, and how to recognize a horseshoe or arch effect in a PCA. We describe canonical correlation analysis (CCA), a popular matrix factorization approach for the integration of single-cell data from different platforms or studies. We discuss alternatives to CCA and why additional preprocessing or weighting datasets within the joint decomposition should be considered.
format Online
Article
Text
id pubmed-7324639
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-73246392020-07-10 Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data Hsu, Lauren L. Culhane, Aedin C. Front Oncol Oncology Integrative, single-cell analyses may provide unprecedented insights into cellular and spatial diversity of the tumor microenvironment. The sparsity, noise, and high dimensionality of these data present unique challenges. Whilst approaches for integrating single-cell data are emerging and are far from being standardized, most data integration, cell clustering, cell trajectory, and analysis pipelines employ a dimension reduction step, frequently principal component analysis (PCA), a matrix factorization method that is relatively fast, and can easily scale to large datasets when used with sparse-matrix representations. In this review, we provide a guide to PCA and related methods. We describe the relationship between PCA and singular value decomposition, the difference between PCA of a correlation and covariance matrix, the impact of scaling, log-transforming, and standardization, and how to recognize a horseshoe or arch effect in a PCA. We describe canonical correlation analysis (CCA), a popular matrix factorization approach for the integration of single-cell data from different platforms or studies. We discuss alternatives to CCA and why additional preprocessing or weighting datasets within the joint decomposition should be considered. Frontiers Media S.A. 2020-06-23 /pmc/articles/PMC7324639/ /pubmed/32656082 http://dx.doi.org/10.3389/fonc.2020.00973 Text en Copyright © 2020 Hsu and Culhane. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Oncology
Hsu, Lauren L.
Culhane, Aedin C.
Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data
title Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data
title_full Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data
title_fullStr Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data
title_full_unstemmed Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data
title_short Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data
title_sort impact of data preprocessing on integrative matrix factorization of single cell data
topic Oncology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7324639/
https://www.ncbi.nlm.nih.gov/pubmed/32656082
http://dx.doi.org/10.3389/fonc.2020.00973
work_keys_str_mv AT hsulaurenl impactofdatapreprocessingonintegrativematrixfactorizationofsinglecelldata
AT culhaneaedinc impactofdatapreprocessingonintegrativematrixfactorizationofsinglecelldata