Cargando…

Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM

Single-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. We observe that snRNA-seq is commonly subject to contamination by high amounts of ambient RNA, which can lead to biased downstr...

Descripción completa

Detalles Bibliográficos
Autores principales: Alvarez, Marcus, Rahmani, Elior, Jew, Brandon, Garske, Kristina M., Miao, Zong, Benhammou, Jihane N., Ye, Chun Jimmie, Pisegna, Joseph R., Pietiläinen, Kirsi H., Halperin, Eran, Pajukanta, Päivi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7335186/
https://www.ncbi.nlm.nih.gov/pubmed/32620816
http://dx.doi.org/10.1038/s41598-020-67513-5
_version_ 1783554091034607616
author Alvarez, Marcus
Rahmani, Elior
Jew, Brandon
Garske, Kristina M.
Miao, Zong
Benhammou, Jihane N.
Ye, Chun Jimmie
Pisegna, Joseph R.
Pietiläinen, Kirsi H.
Halperin, Eran
Pajukanta, Päivi
author_facet Alvarez, Marcus
Rahmani, Elior
Jew, Brandon
Garske, Kristina M.
Miao, Zong
Benhammou, Jihane N.
Ye, Chun Jimmie
Pisegna, Joseph R.
Pietiläinen, Kirsi H.
Halperin, Eran
Pajukanta, Päivi
author_sort Alvarez, Marcus
collection PubMed
description Single-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. We observe that snRNA-seq is commonly subject to contamination by high amounts of ambient RNA, which can lead to biased downstream analyses, such as identification of spurious cell types if overlooked. We present a novel approach to quantify contamination and filter droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: (1) human differentiating preadipocytes in vitro, (2) fresh mouse brain tissue, and (3) human frozen adipose tissue (AT) from six individuals. All three data sets showed evidence of extranuclear RNA contamination, and we observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq, our clustering strategy also successfully filtered single-cell RNA-seq data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.
format Online
Article
Text
id pubmed-7335186
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-73351862020-07-07 Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM Alvarez, Marcus Rahmani, Elior Jew, Brandon Garske, Kristina M. Miao, Zong Benhammou, Jihane N. Ye, Chun Jimmie Pisegna, Joseph R. Pietiläinen, Kirsi H. Halperin, Eran Pajukanta, Päivi Sci Rep Article Single-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. We observe that snRNA-seq is commonly subject to contamination by high amounts of ambient RNA, which can lead to biased downstream analyses, such as identification of spurious cell types if overlooked. We present a novel approach to quantify contamination and filter droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: (1) human differentiating preadipocytes in vitro, (2) fresh mouse brain tissue, and (3) human frozen adipose tissue (AT) from six individuals. All three data sets showed evidence of extranuclear RNA contamination, and we observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq, our clustering strategy also successfully filtered single-cell RNA-seq data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem. Nature Publishing Group UK 2020-07-03 /pmc/articles/PMC7335186/ /pubmed/32620816 http://dx.doi.org/10.1038/s41598-020-67513-5 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Alvarez, Marcus
Rahmani, Elior
Jew, Brandon
Garske, Kristina M.
Miao, Zong
Benhammou, Jihane N.
Ye, Chun Jimmie
Pisegna, Joseph R.
Pietiläinen, Kirsi H.
Halperin, Eran
Pajukanta, Päivi
Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM
title Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM
title_full Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM
title_fullStr Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM
title_full_unstemmed Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM
title_short Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM
title_sort enhancing droplet-based single-nucleus rna-seq resolution using the semi-supervised machine learning classifier diem
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7335186/
https://www.ncbi.nlm.nih.gov/pubmed/32620816
http://dx.doi.org/10.1038/s41598-020-67513-5
work_keys_str_mv AT alvarezmarcus enhancingdropletbasedsinglenucleusrnaseqresolutionusingthesemisupervisedmachinelearningclassifierdiem
AT rahmanielior enhancingdropletbasedsinglenucleusrnaseqresolutionusingthesemisupervisedmachinelearningclassifierdiem
AT jewbrandon enhancingdropletbasedsinglenucleusrnaseqresolutionusingthesemisupervisedmachinelearningclassifierdiem
AT garskekristinam enhancingdropletbasedsinglenucleusrnaseqresolutionusingthesemisupervisedmachinelearningclassifierdiem
AT miaozong enhancingdropletbasedsinglenucleusrnaseqresolutionusingthesemisupervisedmachinelearningclassifierdiem
AT benhammoujihanen enhancingdropletbasedsinglenucleusrnaseqresolutionusingthesemisupervisedmachinelearningclassifierdiem
AT yechunjimmie enhancingdropletbasedsinglenucleusrnaseqresolutionusingthesemisupervisedmachinelearningclassifierdiem
AT pisegnajosephr enhancingdropletbasedsinglenucleusrnaseqresolutionusingthesemisupervisedmachinelearningclassifierdiem
AT pietilainenkirsih enhancingdropletbasedsinglenucleusrnaseqresolutionusingthesemisupervisedmachinelearningclassifierdiem
AT halperineran enhancingdropletbasedsinglenucleusrnaseqresolutionusingthesemisupervisedmachinelearningclassifierdiem
AT pajukantapaivi enhancingdropletbasedsinglenucleusrnaseqresolutionusingthesemisupervisedmachinelearningclassifierdiem