Cargando…

Probabilistic PCA of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qPCR data

Motivation: High-throughput single-cell quantitative real-time polymerase chain reaction (qPCR) is a promising technique allowing for new insights in complex cellular processes. However, the PCR reaction can be detected only up to a certain detection limit, whereas failed reactions could be due to l...

Descripción completa

Detalles Bibliográficos
Autores principales: Buettner, Florian, Moignard, Victoria, Göttgens, Berthold, Theis, Fabian J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4071202/
https://www.ncbi.nlm.nih.gov/pubmed/24618470
http://dx.doi.org/10.1093/bioinformatics/btu134
_version_ 1782322786688040960
author Buettner, Florian
Moignard, Victoria
Göttgens, Berthold
Theis, Fabian J.
author_facet Buettner, Florian
Moignard, Victoria
Göttgens, Berthold
Theis, Fabian J.
author_sort Buettner, Florian
collection PubMed
description Motivation: High-throughput single-cell quantitative real-time polymerase chain reaction (qPCR) is a promising technique allowing for new insights in complex cellular processes. However, the PCR reaction can be detected only up to a certain detection limit, whereas failed reactions could be due to low or absent expression, and the true expression level is unknown. Because this censoring can occur for high proportions of the data, it is one of the main challenges when dealing with single-cell qPCR data. Principal component analysis (PCA) is an important tool for visualizing the structure of high-dimensional data as well as for identifying subpopulations of cells. However, to date it is not clear how to perform a PCA of censored data. We present a probabilistic approach that accounts for the censoring and evaluate it for two typical datasets containing single-cell qPCR data. Results: We use the Gaussian process latent variable model framework to account for censoring by introducing an appropriate noise model and allowing a different kernel for each dimension. We evaluate this new approach for two typical qPCR datasets (of mouse embryonic stem cells and blood stem/progenitor cells, respectively) by performing linear and non-linear probabilistic PCA. Taking the censoring into account results in a 2D representation of the data, which better reflects its known structure: in both datasets, our new approach results in a better separation of known cell types and is able to reveal subpopulations in one dataset that could not be resolved using standard PCA. Availability and implementation: The implementation was based on the existing Gaussian process latent variable model toolbox (https://github.com/SheffieldML/GPmat); extensions for noise models and kernels accounting for censoring are available at http://icb.helmholtz-muenchen.de/censgplvm. Contact: fbuettner.phys@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4071202
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-40712022014-06-26 Probabilistic PCA of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qPCR data Buettner, Florian Moignard, Victoria Göttgens, Berthold Theis, Fabian J. Bioinformatics Original Papers Motivation: High-throughput single-cell quantitative real-time polymerase chain reaction (qPCR) is a promising technique allowing for new insights in complex cellular processes. However, the PCR reaction can be detected only up to a certain detection limit, whereas failed reactions could be due to low or absent expression, and the true expression level is unknown. Because this censoring can occur for high proportions of the data, it is one of the main challenges when dealing with single-cell qPCR data. Principal component analysis (PCA) is an important tool for visualizing the structure of high-dimensional data as well as for identifying subpopulations of cells. However, to date it is not clear how to perform a PCA of censored data. We present a probabilistic approach that accounts for the censoring and evaluate it for two typical datasets containing single-cell qPCR data. Results: We use the Gaussian process latent variable model framework to account for censoring by introducing an appropriate noise model and allowing a different kernel for each dimension. We evaluate this new approach for two typical qPCR datasets (of mouse embryonic stem cells and blood stem/progenitor cells, respectively) by performing linear and non-linear probabilistic PCA. Taking the censoring into account results in a 2D representation of the data, which better reflects its known structure: in both datasets, our new approach results in a better separation of known cell types and is able to reveal subpopulations in one dataset that could not be resolved using standard PCA. Availability and implementation: The implementation was based on the existing Gaussian process latent variable model toolbox (https://github.com/SheffieldML/GPmat); extensions for noise models and kernels accounting for censoring are available at http://icb.helmholtz-muenchen.de/censgplvm. Contact: fbuettner.phys@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2014-07-01 2014-03-10 /pmc/articles/PMC4071202/ /pubmed/24618470 http://dx.doi.org/10.1093/bioinformatics/btu134 Text en © The Author 2014. Published by Oxford University Press. All rights reserved. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Buettner, Florian
Moignard, Victoria
Göttgens, Berthold
Theis, Fabian J.
Probabilistic PCA of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qPCR data
title Probabilistic PCA of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qPCR data
title_full Probabilistic PCA of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qPCR data
title_fullStr Probabilistic PCA of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qPCR data
title_full_unstemmed Probabilistic PCA of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qPCR data
title_short Probabilistic PCA of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qPCR data
title_sort probabilistic pca of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qpcr data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4071202/
https://www.ncbi.nlm.nih.gov/pubmed/24618470
http://dx.doi.org/10.1093/bioinformatics/btu134
work_keys_str_mv AT buettnerflorian probabilisticpcaofcensoreddataaccountingforuncertaintiesinthevisualizationofhighthroughputsinglecellqpcrdata
AT moignardvictoria probabilisticpcaofcensoreddataaccountingforuncertaintiesinthevisualizationofhighthroughputsinglecellqpcrdata
AT gottgensberthold probabilisticpcaofcensoreddataaccountingforuncertaintiesinthevisualizationofhighthroughputsinglecellqpcrdata
AT theisfabianj probabilisticpcaofcensoreddataaccountingforuncertaintiesinthevisualizationofhighthroughputsinglecellqpcrdata