Cargando…
Biology-inspired data-driven quality control for scientific discovery in single-cell transcriptomics
BACKGROUND: Quality control (QC) of cells, a critical first step in single-cell RNA sequencing data analysis, has largely relied on arbitrarily fixed data-agnostic thresholds applied to QC metrics such as gene complexity and fraction of reads mapping to mitochondrial genes. The few existing data-dri...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9793662/ https://www.ncbi.nlm.nih.gov/pubmed/36575523 http://dx.doi.org/10.1186/s13059-022-02820-w |
_version_ | 1784859884516278272 |
---|---|
author | Subramanian, Ayshwarya Alperovich, Mikhail Yang, Yiming Li, Bo |
author_facet | Subramanian, Ayshwarya Alperovich, Mikhail Yang, Yiming Li, Bo |
author_sort | Subramanian, Ayshwarya |
collection | PubMed |
description | BACKGROUND: Quality control (QC) of cells, a critical first step in single-cell RNA sequencing data analysis, has largely relied on arbitrarily fixed data-agnostic thresholds applied to QC metrics such as gene complexity and fraction of reads mapping to mitochondrial genes. The few existing data-driven approaches perform QC at the level of samples or studies without accounting for biological variation. RESULTS: We first demonstrate that QC metrics vary with both tissue and cell types across technologies, study conditions, and species. We then propose data-driven QC (ddqc), an unsupervised adaptive QC framework to perform flexible and data-driven QC at the level of cell types while retaining critical biological insights and improved power for downstream analysis. ddqc applies an adaptive threshold based on the median absolute deviation on four QC metrics (gene and UMI complexity, fraction of reads mapping to mitochondrial and ribosomal genes). ddqc retains over a third more cells when compared to conventional data-agnostic QC filters. Finally, we show that ddqc recovers biologically meaningful trends in gradation of gene complexity among cell types that can help answer questions of biological interest such as which cell types express the least and most number of transcripts overall, and ribosomal transcripts specifically. CONCLUSIONS: ddqc retains cell types such as metabolically active parenchymal cells and specialized cells such as neutrophils which are often lost by conventional QC. Taken together, our work proposes a revised paradigm to quality filtering best practices—iterative QC, providing a data-driven QC framework compatible with observed biological diversity. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-022-02820-w. |
format | Online Article Text |
id | pubmed-9793662 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-97936622022-12-28 Biology-inspired data-driven quality control for scientific discovery in single-cell transcriptomics Subramanian, Ayshwarya Alperovich, Mikhail Yang, Yiming Li, Bo Genome Biol Research BACKGROUND: Quality control (QC) of cells, a critical first step in single-cell RNA sequencing data analysis, has largely relied on arbitrarily fixed data-agnostic thresholds applied to QC metrics such as gene complexity and fraction of reads mapping to mitochondrial genes. The few existing data-driven approaches perform QC at the level of samples or studies without accounting for biological variation. RESULTS: We first demonstrate that QC metrics vary with both tissue and cell types across technologies, study conditions, and species. We then propose data-driven QC (ddqc), an unsupervised adaptive QC framework to perform flexible and data-driven QC at the level of cell types while retaining critical biological insights and improved power for downstream analysis. ddqc applies an adaptive threshold based on the median absolute deviation on four QC metrics (gene and UMI complexity, fraction of reads mapping to mitochondrial and ribosomal genes). ddqc retains over a third more cells when compared to conventional data-agnostic QC filters. Finally, we show that ddqc recovers biologically meaningful trends in gradation of gene complexity among cell types that can help answer questions of biological interest such as which cell types express the least and most number of transcripts overall, and ribosomal transcripts specifically. CONCLUSIONS: ddqc retains cell types such as metabolically active parenchymal cells and specialized cells such as neutrophils which are often lost by conventional QC. Taken together, our work proposes a revised paradigm to quality filtering best practices—iterative QC, providing a data-driven QC framework compatible with observed biological diversity. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-022-02820-w. BioMed Central 2022-12-27 /pmc/articles/PMC9793662/ /pubmed/36575523 http://dx.doi.org/10.1186/s13059-022-02820-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Subramanian, Ayshwarya Alperovich, Mikhail Yang, Yiming Li, Bo Biology-inspired data-driven quality control for scientific discovery in single-cell transcriptomics |
title | Biology-inspired data-driven quality control for scientific discovery in single-cell transcriptomics |
title_full | Biology-inspired data-driven quality control for scientific discovery in single-cell transcriptomics |
title_fullStr | Biology-inspired data-driven quality control for scientific discovery in single-cell transcriptomics |
title_full_unstemmed | Biology-inspired data-driven quality control for scientific discovery in single-cell transcriptomics |
title_short | Biology-inspired data-driven quality control for scientific discovery in single-cell transcriptomics |
title_sort | biology-inspired data-driven quality control for scientific discovery in single-cell transcriptomics |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9793662/ https://www.ncbi.nlm.nih.gov/pubmed/36575523 http://dx.doi.org/10.1186/s13059-022-02820-w |
work_keys_str_mv | AT subramanianayshwarya biologyinspireddatadrivenqualitycontrolforscientificdiscoveryinsinglecelltranscriptomics AT alperovichmikhail biologyinspireddatadrivenqualitycontrolforscientificdiscoveryinsinglecelltranscriptomics AT yangyiming biologyinspireddatadrivenqualitycontrolforscientificdiscoveryinsinglecelltranscriptomics AT libo biologyinspireddatadrivenqualitycontrolforscientificdiscoveryinsinglecelltranscriptomics |