Cargando…

Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing

Recently, a new modification has been proposed by Hjörleifsson and Sullivan et al. to the model used to classify the splicing status of reads (as spliced (mature), unspliced (nascent), or ambiguous) in single-cell and single-nucleus RNA-seq data. Here, we evaluate both the theoretical basis and prac...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Dongze, Soneson, Charlotte, Patro, Rob
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881993/
https://www.ncbi.nlm.nih.gov/pubmed/36711921
http://dx.doi.org/10.1101/2023.01.04.522742
_version_ 1784879221293711360
author He, Dongze
Soneson, Charlotte
Patro, Rob
author_facet He, Dongze
Soneson, Charlotte
Patro, Rob
author_sort He, Dongze
collection PubMed
description Recently, a new modification has been proposed by Hjörleifsson and Sullivan et al. to the model used to classify the splicing status of reads (as spliced (mature), unspliced (nascent), or ambiguous) in single-cell and single-nucleus RNA-seq data. Here, we evaluate both the theoretical basis and practical implementation of the proposed method. The proposed method is highly-conservative, and therefore, unlikely to mischaracterize reads as spliced (mature) or unspliced (nascent) when they are not. However, we find that it leaves a large fraction of reads classified as ambiguous, and, in practice, allocates these ambiguous reads in an all-or-nothing manner, and differently between single-cell and single-nucleus RNA-seq data. Further, as implemented in practice, the ambiguous classification is implicit and based on the index against which the reads are mapped, which leads to several drawbacks compared to methods that consider both spliced (mature) and unspliced (nascent) mapping targets simultaneously — for example, the ability to use confidently assigned reads to rescue ambiguous reads based on shared UMIs and gene targets. Nonetheless, we show that these conservative assignment rules can be obtained directly in existing approaches simply by altering the set of targets that are indexed. To this end, we introduce the spliceu reference and show that its use with alevin-fry recapitulates the more conservative proposed classification. We also observe that, on experimental data, and under the proposed allocation rules for ambiguous UMIs, the difference between the proposed classification scheme and existing conventions appears much smaller than previously reported. We demonstrate the use of the new piscem index for mapping simultaneously against spliced (mature) and unspliced (nascent) targets, allowing classification against the full nascent and mature transcriptome in human or mouse in <3GB of memory. Finally, we discuss the potential of incorporating probabilistic evidence into the inference of splicing status, and suggest that it may provide benefits beyond what can be obtained from discrete classification of UMIs as splicing-ambiguous.
format Online
Article
Text
id pubmed-9881993
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-98819932023-01-28 Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing He, Dongze Soneson, Charlotte Patro, Rob bioRxiv Article Recently, a new modification has been proposed by Hjörleifsson and Sullivan et al. to the model used to classify the splicing status of reads (as spliced (mature), unspliced (nascent), or ambiguous) in single-cell and single-nucleus RNA-seq data. Here, we evaluate both the theoretical basis and practical implementation of the proposed method. The proposed method is highly-conservative, and therefore, unlikely to mischaracterize reads as spliced (mature) or unspliced (nascent) when they are not. However, we find that it leaves a large fraction of reads classified as ambiguous, and, in practice, allocates these ambiguous reads in an all-or-nothing manner, and differently between single-cell and single-nucleus RNA-seq data. Further, as implemented in practice, the ambiguous classification is implicit and based on the index against which the reads are mapped, which leads to several drawbacks compared to methods that consider both spliced (mature) and unspliced (nascent) mapping targets simultaneously — for example, the ability to use confidently assigned reads to rescue ambiguous reads based on shared UMIs and gene targets. Nonetheless, we show that these conservative assignment rules can be obtained directly in existing approaches simply by altering the set of targets that are indexed. To this end, we introduce the spliceu reference and show that its use with alevin-fry recapitulates the more conservative proposed classification. We also observe that, on experimental data, and under the proposed allocation rules for ambiguous UMIs, the difference between the proposed classification scheme and existing conventions appears much smaller than previously reported. We demonstrate the use of the new piscem index for mapping simultaneously against spliced (mature) and unspliced (nascent) targets, allowing classification against the full nascent and mature transcriptome in human or mouse in <3GB of memory. Finally, we discuss the potential of incorporating probabilistic evidence into the inference of splicing status, and suggest that it may provide benefits beyond what can be obtained from discrete classification of UMIs as splicing-ambiguous. Cold Spring Harbor Laboratory 2023-01-04 /pmc/articles/PMC9881993/ /pubmed/36711921 http://dx.doi.org/10.1101/2023.01.04.522742 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
He, Dongze
Soneson, Charlotte
Patro, Rob
Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing
title Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing
title_full Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing
title_fullStr Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing
title_full_unstemmed Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing
title_short Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing
title_sort understanding and evaluating ambiguity in single-cell and single-nucleus rna-sequencing
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881993/
https://www.ncbi.nlm.nih.gov/pubmed/36711921
http://dx.doi.org/10.1101/2023.01.04.522742
work_keys_str_mv AT hedongze understandingandevaluatingambiguityinsinglecellandsinglenucleusrnasequencing
AT sonesoncharlotte understandingandevaluatingambiguityinsinglecellandsinglenucleusrnasequencing
AT patrorob understandingandevaluatingambiguityinsinglecellandsinglenucleusrnasequencing