Cargando…
A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types
Recent advances in single cell RNA sequencing (scRNA-seq) technologies have been invaluable in the study of the diversity of cancer cells and the tumor microenvironment. While scRNA-seq platforms allow processing of a high number of cells, uneven read quality and technical artifacts hinder the abili...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9732024/ https://www.ncbi.nlm.nih.gov/pubmed/36506328 http://dx.doi.org/10.3389/fgene.2022.982019 |
_version_ | 1784846036104118272 |
---|---|
author | Bishara, Isaac Chen, Jinfeng Griffiths, Jason I. Bild, Andrea H. Nath, Aritro |
author_facet | Bishara, Isaac Chen, Jinfeng Griffiths, Jason I. Bild, Andrea H. Nath, Aritro |
author_sort | Bishara, Isaac |
collection | PubMed |
description | Recent advances in single cell RNA sequencing (scRNA-seq) technologies have been invaluable in the study of the diversity of cancer cells and the tumor microenvironment. While scRNA-seq platforms allow processing of a high number of cells, uneven read quality and technical artifacts hinder the ability to identify and classify biologically relevant cells into correct subtypes. This obstructs the analysis of cancer and normal cell diversity, while rare and low expression cell populations may be lost by setting arbitrary high cutoffs for UMIs when filtering out low quality cells. To address these issues, we have developed a novel machine-learning framework that: 1. Trains cell lineage and subtype classifier using a gold standard dataset validated using marker genes 2. Systematically assess the lowest UMI threshold that can be used in a given dataset to accurately classify cells 3. Assign accurate cell lineage and subtype labels to the lower read depth cells recovered by setting the optimal threshold. We demonstrate the application of this framework in a well-curated scRNA-seq dataset of breast cancer patients and two external datasets. We show that the minimum UMI threshold for the breast cancer dataset could be lowered from the original 1500 to 450, thereby increasing the total number of recovered cells by 49%, while achieving a classification accuracy of >0.9. Our framework provides a roadmap for future scRNA-seq studies to determine optimal UMI threshold and accurately classify cells for downstream analyses. |
format | Online Article Text |
id | pubmed-9732024 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-97320242022-12-10 A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types Bishara, Isaac Chen, Jinfeng Griffiths, Jason I. Bild, Andrea H. Nath, Aritro Front Genet Genetics Recent advances in single cell RNA sequencing (scRNA-seq) technologies have been invaluable in the study of the diversity of cancer cells and the tumor microenvironment. While scRNA-seq platforms allow processing of a high number of cells, uneven read quality and technical artifacts hinder the ability to identify and classify biologically relevant cells into correct subtypes. This obstructs the analysis of cancer and normal cell diversity, while rare and low expression cell populations may be lost by setting arbitrary high cutoffs for UMIs when filtering out low quality cells. To address these issues, we have developed a novel machine-learning framework that: 1. Trains cell lineage and subtype classifier using a gold standard dataset validated using marker genes 2. Systematically assess the lowest UMI threshold that can be used in a given dataset to accurately classify cells 3. Assign accurate cell lineage and subtype labels to the lower read depth cells recovered by setting the optimal threshold. We demonstrate the application of this framework in a well-curated scRNA-seq dataset of breast cancer patients and two external datasets. We show that the minimum UMI threshold for the breast cancer dataset could be lowered from the original 1500 to 450, thereby increasing the total number of recovered cells by 49%, while achieving a classification accuracy of >0.9. Our framework provides a roadmap for future scRNA-seq studies to determine optimal UMI threshold and accurately classify cells for downstream analyses. Frontiers Media S.A. 2022-11-25 /pmc/articles/PMC9732024/ /pubmed/36506328 http://dx.doi.org/10.3389/fgene.2022.982019 Text en Copyright © 2022 Bishara, Chen, Griffiths, Bild and Nath. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Bishara, Isaac Chen, Jinfeng Griffiths, Jason I. Bild, Andrea H. Nath, Aritro A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types |
title | A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types |
title_full | A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types |
title_fullStr | A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types |
title_full_unstemmed | A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types |
title_short | A machine learning framework for scRNA-seq UMI threshold optimization and accurate classification of cell types |
title_sort | machine learning framework for scrna-seq umi threshold optimization and accurate classification of cell types |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9732024/ https://www.ncbi.nlm.nih.gov/pubmed/36506328 http://dx.doi.org/10.3389/fgene.2022.982019 |
work_keys_str_mv | AT bisharaisaac amachinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes AT chenjinfeng amachinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes AT griffithsjasoni amachinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes AT bildandreah amachinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes AT natharitro amachinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes AT bisharaisaac machinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes AT chenjinfeng machinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes AT griffithsjasoni machinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes AT bildandreah machinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes AT natharitro machinelearningframeworkforscrnasequmithresholdoptimizationandaccurateclassificationofcelltypes |