Cargando…

BIRD: identifying cell doublets via biallelic expression from single cells

SUMMARY: Current technologies for single-cell transcriptomics allow thousands of cells to be analyzed in a single experiment. The increased scale of these methods raises the risk of cell doublets contamination. Available tools and algorithms for identifying doublets and estimating their occurrence i...

Descripción completa

Detalles Bibliográficos
Autores principales: Wainer-Katsir, Kerem, Linial, Michal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355245/
https://www.ncbi.nlm.nih.gov/pubmed/32657402
http://dx.doi.org/10.1093/bioinformatics/btaa474
_version_ 1783558235750400000
author Wainer-Katsir, Kerem
Linial, Michal
author_facet Wainer-Katsir, Kerem
Linial, Michal
author_sort Wainer-Katsir, Kerem
collection PubMed
description SUMMARY: Current technologies for single-cell transcriptomics allow thousands of cells to be analyzed in a single experiment. The increased scale of these methods raises the risk of cell doublets contamination. Available tools and algorithms for identifying doublets and estimating their occurrence in single-cell experimental data focus on doublets of different species, cell types or individuals. In this study, we analyze transcriptomic data from single cells having an identical genetic background. We claim that the ratio of monoallelic to biallelic expression provides a discriminating power toward doublets’ identification. We present a pipeline called BIallelic Ratio for Doublets (BIRD) that relies on heterologous genetic variations, from single-cell RNA sequencing. For each dataset, doublets were artificially created from the actual data and used to train a predictive model. BIRD was applied on Smart-seq data from 163 primary fibroblast single cells. The model achieved 100% accuracy in annotating the randomly simulated doublets. Bonafide doublets were verified based on a biallelic expression signal amongst X-chromosome of female fibroblasts. Data from 10X Genomics microfluidics of human peripheral blood cells achieved in average 83% (±3.7%) accuracy, and an area under the curve of 0.88 (±0.04) for a collection of ∼13 300 single cells. BIRD addresses instances of doublets, which were formed from cell mixtures of identical genetic background and cell identity. Maximal performance is achieved for high-coverage data from Smart-seq. Success in identifying doublets is data specific which varies according to the experimental methodology, genomic diversity between haplotypes, sequence coverage and depth. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7355245
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-73552452020-07-16 BIRD: identifying cell doublets via biallelic expression from single cells Wainer-Katsir, Kerem Linial, Michal Bioinformatics Macromolecular Sequence, Structure, and Function SUMMARY: Current technologies for single-cell transcriptomics allow thousands of cells to be analyzed in a single experiment. The increased scale of these methods raises the risk of cell doublets contamination. Available tools and algorithms for identifying doublets and estimating their occurrence in single-cell experimental data focus on doublets of different species, cell types or individuals. In this study, we analyze transcriptomic data from single cells having an identical genetic background. We claim that the ratio of monoallelic to biallelic expression provides a discriminating power toward doublets’ identification. We present a pipeline called BIallelic Ratio for Doublets (BIRD) that relies on heterologous genetic variations, from single-cell RNA sequencing. For each dataset, doublets were artificially created from the actual data and used to train a predictive model. BIRD was applied on Smart-seq data from 163 primary fibroblast single cells. The model achieved 100% accuracy in annotating the randomly simulated doublets. Bonafide doublets were verified based on a biallelic expression signal amongst X-chromosome of female fibroblasts. Data from 10X Genomics microfluidics of human peripheral blood cells achieved in average 83% (±3.7%) accuracy, and an area under the curve of 0.88 (±0.04) for a collection of ∼13 300 single cells. BIRD addresses instances of doublets, which were formed from cell mixtures of identical genetic background and cell identity. Maximal performance is achieved for high-coverage data from Smart-seq. Success in identifying doublets is data specific which varies according to the experimental methodology, genomic diversity between haplotypes, sequence coverage and depth. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-07 2020-07-13 /pmc/articles/PMC7355245/ /pubmed/32657402 http://dx.doi.org/10.1093/bioinformatics/btaa474 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Macromolecular Sequence, Structure, and Function
Wainer-Katsir, Kerem
Linial, Michal
BIRD: identifying cell doublets via biallelic expression from single cells
title BIRD: identifying cell doublets via biallelic expression from single cells
title_full BIRD: identifying cell doublets via biallelic expression from single cells
title_fullStr BIRD: identifying cell doublets via biallelic expression from single cells
title_full_unstemmed BIRD: identifying cell doublets via biallelic expression from single cells
title_short BIRD: identifying cell doublets via biallelic expression from single cells
title_sort bird: identifying cell doublets via biallelic expression from single cells
topic Macromolecular Sequence, Structure, and Function
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7355245/
https://www.ncbi.nlm.nih.gov/pubmed/32657402
http://dx.doi.org/10.1093/bioinformatics/btaa474
work_keys_str_mv AT wainerkatsirkerem birdidentifyingcelldoubletsviabiallelicexpressionfromsinglecells
AT linialmichal birdidentifyingcelldoubletsviabiallelicexpressionfromsinglecells