Cargando…

False signals induced by single-cell imputation

Background: Single-cell RNA-seq is a powerful tool for measuring gene expression at the resolution of individual cells.  A challenge in the analysis of this data is the large amount of zero values, representing either missing data or no expression. Several imputation approaches have been proposed to...

Descripción completa

Detalles Bibliográficos
Autores principales: Andrews, Tallulah S., Hemberg, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6415334/
https://www.ncbi.nlm.nih.gov/pubmed/30906525
http://dx.doi.org/10.12688/f1000research.16613.2
_version_ 1783403163708030976
author Andrews, Tallulah S.
Hemberg, Martin
author_facet Andrews, Tallulah S.
Hemberg, Martin
author_sort Andrews, Tallulah S.
collection PubMed
description Background: Single-cell RNA-seq is a powerful tool for measuring gene expression at the resolution of individual cells.  A challenge in the analysis of this data is the large amount of zero values, representing either missing data or no expression. Several imputation approaches have been proposed to address this issue, but they generally rely on structure inherent to the dataset under consideration they may not provide any additional information, hence, are limited by the information contained therein and the validity of their assumptions. Methods: We evaluated the risk of generating false positive or irreproducible differential expression when imputing data with six different methods. We applied each method to a variety of simulated datasets as well as to permuted real single-cell RNA-seq datasets and consider the number of false positive gene-gene correlations and differentially expressed genes. Using matched 10X and Smart-seq2 data we examined whether cell-type specific markers were reproducible across datasets derived from the same tissue before and after imputation. Results: The extent of false-positives introduced by imputation varied considerably by method. Data smoothing based methods, MAGIC, knn-smooth and dca, generated many false-positives in both real and simulated data. Model-based imputation methods typically generated fewer false-positives but this varied greatly depending on the diversity of cell-types in the sample. All imputation methods decreased the reproducibility of cell-type specific markers, although this could be mitigated by selecting markers with large effect size and significance. Conclusions: Imputation of single-cell RNA-seq data introduces circularity that can generate false-positive results. Thus, statistical tests applied to imputed data should be treated with care. Additional filtering by effect size can reduce but not fully eliminate these effects. Of the methods we considered, SAVER was the least likely to generate false or irreproducible results, thus should be favoured over alternatives if imputation is necessary.
format Online
Article
Text
id pubmed-6415334
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-64153342019-03-21 False signals induced by single-cell imputation Andrews, Tallulah S. Hemberg, Martin F1000Res Research Article Background: Single-cell RNA-seq is a powerful tool for measuring gene expression at the resolution of individual cells.  A challenge in the analysis of this data is the large amount of zero values, representing either missing data or no expression. Several imputation approaches have been proposed to address this issue, but they generally rely on structure inherent to the dataset under consideration they may not provide any additional information, hence, are limited by the information contained therein and the validity of their assumptions. Methods: We evaluated the risk of generating false positive or irreproducible differential expression when imputing data with six different methods. We applied each method to a variety of simulated datasets as well as to permuted real single-cell RNA-seq datasets and consider the number of false positive gene-gene correlations and differentially expressed genes. Using matched 10X and Smart-seq2 data we examined whether cell-type specific markers were reproducible across datasets derived from the same tissue before and after imputation. Results: The extent of false-positives introduced by imputation varied considerably by method. Data smoothing based methods, MAGIC, knn-smooth and dca, generated many false-positives in both real and simulated data. Model-based imputation methods typically generated fewer false-positives but this varied greatly depending on the diversity of cell-types in the sample. All imputation methods decreased the reproducibility of cell-type specific markers, although this could be mitigated by selecting markers with large effect size and significance. Conclusions: Imputation of single-cell RNA-seq data introduces circularity that can generate false-positive results. Thus, statistical tests applied to imputed data should be treated with care. Additional filtering by effect size can reduce but not fully eliminate these effects. Of the methods we considered, SAVER was the least likely to generate false or irreproducible results, thus should be favoured over alternatives if imputation is necessary. F1000 Research Limited 2019-03-05 /pmc/articles/PMC6415334/ /pubmed/30906525 http://dx.doi.org/10.12688/f1000research.16613.2 Text en Copyright: © 2019 Andrews TS and Hemberg M http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Andrews, Tallulah S.
Hemberg, Martin
False signals induced by single-cell imputation
title False signals induced by single-cell imputation
title_full False signals induced by single-cell imputation
title_fullStr False signals induced by single-cell imputation
title_full_unstemmed False signals induced by single-cell imputation
title_short False signals induced by single-cell imputation
title_sort false signals induced by single-cell imputation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6415334/
https://www.ncbi.nlm.nih.gov/pubmed/30906525
http://dx.doi.org/10.12688/f1000research.16613.2
work_keys_str_mv AT andrewstallulahs falsesignalsinducedbysinglecellimputation
AT hembergmartin falsesignalsinducedbysinglecellimputation