Cargando…

scds: computational annotation of doublets in single-cell RNA sequencing data

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) technologies enable the study of transcriptional heterogeneity at the resolution of individual cells and have an increasing impact on biomedical research. However, it is known that these methods sometimes wrongly consider two or more cells as single...

Descripción completa

Detalles Bibliográficos
Autores principales: Bais, Abha S, Kostka, Dennis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703774/
https://www.ncbi.nlm.nih.gov/pubmed/31501871
http://dx.doi.org/10.1093/bioinformatics/btz698
_version_ 1783616693333917696
author Bais, Abha S
Kostka, Dennis
author_facet Bais, Abha S
Kostka, Dennis
author_sort Bais, Abha S
collection PubMed
description MOTIVATION: Single-cell RNA sequencing (scRNA-seq) technologies enable the study of transcriptional heterogeneity at the resolution of individual cells and have an increasing impact on biomedical research. However, it is known that these methods sometimes wrongly consider two or more cells as single cells, and that a number of so-called doublets is present in the output of such experiments. Treating doublets as single cells in downstream analyses can severely bias a study’s conclusions, and therefore computational strategies for the identification of doublets are needed. RESULTS: With scds, we propose two new approaches for in silico doublet identification: Co-expression based doublet scoring (cxds) and binary classification based doublet scoring (bcds). The co-expression based approach, cxds, utilizes binarized (absence/presence) gene expression data and, employing a binomial model for the co-expression of pairs of genes, yields interpretable doublet annotations. bcds, on the other hand, uses a binary classification approach to discriminate artificial doublets from original data. We apply our methods and existing computational doublet identification approaches to four datasets with experimental doublet annotations and find that our methods perform at least as well as the state of the art, at comparably little computational cost. We observe appreciable differences between methods and across datasets and that no approach dominates all others. In summary, scds presents a scalable, competitive approach that allows for doublet annotation of datasets with thousands of cells in a matter of seconds. AVAILABILITY AND IMPLEMENTATION: scds is implemented as a Bioconductor R package (doi: 10.18129/B9.bioc.scds). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7703774
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77037742020-12-07 scds: computational annotation of doublets in single-cell RNA sequencing data Bais, Abha S Kostka, Dennis Bioinformatics Original Papers MOTIVATION: Single-cell RNA sequencing (scRNA-seq) technologies enable the study of transcriptional heterogeneity at the resolution of individual cells and have an increasing impact on biomedical research. However, it is known that these methods sometimes wrongly consider two or more cells as single cells, and that a number of so-called doublets is present in the output of such experiments. Treating doublets as single cells in downstream analyses can severely bias a study’s conclusions, and therefore computational strategies for the identification of doublets are needed. RESULTS: With scds, we propose two new approaches for in silico doublet identification: Co-expression based doublet scoring (cxds) and binary classification based doublet scoring (bcds). The co-expression based approach, cxds, utilizes binarized (absence/presence) gene expression data and, employing a binomial model for the co-expression of pairs of genes, yields interpretable doublet annotations. bcds, on the other hand, uses a binary classification approach to discriminate artificial doublets from original data. We apply our methods and existing computational doublet identification approaches to four datasets with experimental doublet annotations and find that our methods perform at least as well as the state of the art, at comparably little computational cost. We observe appreciable differences between methods and across datasets and that no approach dominates all others. In summary, scds presents a scalable, competitive approach that allows for doublet annotation of datasets with thousands of cells in a matter of seconds. AVAILABILITY AND IMPLEMENTATION: scds is implemented as a Bioconductor R package (doi: 10.18129/B9.bioc.scds). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-09-10 /pmc/articles/PMC7703774/ /pubmed/31501871 http://dx.doi.org/10.1093/bioinformatics/btz698 Text en © The Author(s) 2019. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Bais, Abha S
Kostka, Dennis
scds: computational annotation of doublets in single-cell RNA sequencing data
title scds: computational annotation of doublets in single-cell RNA sequencing data
title_full scds: computational annotation of doublets in single-cell RNA sequencing data
title_fullStr scds: computational annotation of doublets in single-cell RNA sequencing data
title_full_unstemmed scds: computational annotation of doublets in single-cell RNA sequencing data
title_short scds: computational annotation of doublets in single-cell RNA sequencing data
title_sort scds: computational annotation of doublets in single-cell rna sequencing data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703774/
https://www.ncbi.nlm.nih.gov/pubmed/31501871
http://dx.doi.org/10.1093/bioinformatics/btz698
work_keys_str_mv AT baisabhas scdscomputationalannotationofdoubletsinsinglecellrnasequencingdata
AT kostkadennis scdscomputationalannotationofdoubletsinsinglecellrnasequencingdata