Cargando…

A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types

Recent advances in single cell RNA sequencing (scRNA-seq) technologies have been invaluable in the study of the diversity of cancer cells and the tumor microenvironment. While scRNA-seq platforms allow processing of a high number of cells, uneven read quality and technical artifacts hinder the abili...

Descripción completa

Detalles Bibliográficos
Autores principales: Bishara, Isaac, Chen, Jinfeng, Griffiths, Jason I., Bild, Andrea H., Nath, Aritro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9732024/
https://www.ncbi.nlm.nih.gov/pubmed/36506328
http://dx.doi.org/10.3389/fgene.2022.982019
_version_ 1784846036104118272
author Bishara, Isaac
Chen, Jinfeng
Griffiths, Jason I.
Bild, Andrea H.
Nath, Aritro
author_facet Bishara, Isaac
Chen, Jinfeng
Griffiths, Jason I.
Bild, Andrea H.
Nath, Aritro
author_sort Bishara, Isaac
collection PubMed
description Recent advances in single cell RNA sequencing (scRNA-seq) technologies have been invaluable in the study of the diversity of cancer cells and the tumor microenvironment. While scRNA-seq platforms allow processing of a high number of cells, uneven read quality and technical artifacts hinder the ability to identify and classify biologically relevant cells into correct subtypes. This obstructs the analysis of cancer and normal cell diversity, while rare and low expression cell populations may be lost by setting arbitrary high cutoffs for UMIs when filtering out low quality cells. To address these issues, we have developed a novel machine-learning framework that: 1. Trains cell lineage and subtype classifier using a gold standard dataset validated using marker genes 2. Systematically assess the lowest UMI threshold that can be used in a given dataset to accurately classify cells 3. Assign accurate cell lineage and subtype labels to the lower read depth cells recovered by setting the optimal threshold. We demonstrate the application of this framework in a well-curated scRNA-seq dataset of breast cancer patients and two external datasets. We show that the minimum UMI threshold for the breast cancer dataset could be lowered from the original 1500 to 450, thereby increasing the total number of recovered cells by 49%, while achieving a classification accuracy of >0.9. Our framework provides a roadmap for future scRNA-seq studies to determine optimal UMI threshold and accurately classify cells for downstream analyses.
format Online
Article
Text
id pubmed-9732024
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-97320242022-12-10 A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types Bishara, Isaac Chen, Jinfeng Griffiths, Jason I. Bild, Andrea H. Nath, Aritro Front Genet Genetics Recent advances in single cell RNA sequencing (scRNA-seq) technologies have been invaluable in the study of the diversity of cancer cells and the tumor microenvironment. While scRNA-seq platforms allow processing of a high number of cells, uneven read quality and technical artifacts hinder the ability to identify and classify biologically relevant cells into correct subtypes. This obstructs the analysis of cancer and normal cell diversity, while rare and low expression cell populations may be lost by setting arbitrary high cutoffs for UMIs when filtering out low quality cells. To address these issues, we have developed a novel machine-learning framework that: 1. Trains cell lineage and subtype classifier using a gold standard dataset validated using marker genes 2. Systematically assess the lowest UMI threshold that can be used in a given dataset to accurately classify cells 3. Assign accurate cell lineage and subtype labels to the lower read depth cells recovered by setting the optimal threshold. We demonstrate the application of this framework in a well-curated scRNA-seq dataset of breast cancer patients and two external datasets. We show that the minimum UMI threshold for the breast cancer dataset could be lowered from the original 1500 to 450, thereby increasing the total number of recovered cells by 49%, while achieving a classification accuracy of >0.9. Our framework provides a roadmap for future scRNA-seq studies to determine optimal UMI threshold and accurately classify cells for downstream analyses. Frontiers Media S.A. 2022-11-25 /pmc/articles/PMC9732024/ /pubmed/36506328 http://dx.doi.org/10.3389/fgene.2022.982019 Text en Copyright © 2022 Bishara, Chen, Griffiths, Bild and Nath. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Bishara, Isaac
Chen, Jinfeng
Griffiths, Jason I.
Bild, Andrea H.
Nath, Aritro
A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types
title A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types
title_full A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types
title_fullStr A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types
title_full_unstemmed A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types
title_short A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types
title_sort machine learning framework for scrna-seq umi threshold optimization and accurate classification of cell types
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9732024/
https://www.ncbi.nlm.nih.gov/pubmed/36506328
http://dx.doi.org/10.3389/fgene.2022.982019
work_keys_str_mv AT bisharaisaac amachinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes
AT chenjinfeng amachinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes
AT griffithsjasoni amachinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes
AT bildandreah amachinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes
AT natharitro amachinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes
AT bisharaisaac machinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes
AT chenjinfeng machinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes
AT griffithsjasoni machinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes
AT bildandreah machinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes
AT natharitro machinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes