Cargando…

SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing

MOTIVATION: Several recently developed single-cell DNA sequencing technologies enable whole-genome sequencing of thousands of cells. However, the ultra-low coverage of the sequenced data (<0.05× per cell) mostly limits their usage to the identification of copy number alterations in multi-megabase...

Descripción completa

Detalles Bibliográficos
Autores principales: Rozhoňová, Hana, Danciu, Daniel, Stark, Stefan, Rätsch, Gunnar, Kahles, André, Lehmann, Kjong-Van
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9477524/
https://www.ncbi.nlm.nih.gov/pubmed/35900151
http://dx.doi.org/10.1093/bioinformatics/btac510
_version_ 1784790380264292352
author Rozhoňová, Hana
Danciu, Daniel
Stark, Stefan
Rätsch, Gunnar
Kahles, André
Lehmann, Kjong-Van
author_facet Rozhoňová, Hana
Danciu, Daniel
Stark, Stefan
Rätsch, Gunnar
Kahles, André
Lehmann, Kjong-Van
author_sort Rozhoňová, Hana
collection PubMed
description MOTIVATION: Several recently developed single-cell DNA sequencing technologies enable whole-genome sequencing of thousands of cells. However, the ultra-low coverage of the sequenced data (<0.05× per cell) mostly limits their usage to the identification of copy number alterations in multi-megabase segments. Many tumors are not copy number-driven, and thus single-nucleotide variant (SNV)-based subclone detection may contribute to a more comprehensive view on intra-tumor heterogeneity. Due to the low coverage of the data, the identification of SNVs is only possible when superimposing the sequenced genomes of hundreds of genetically similar cells. Thus, we have developed a new approach to efficiently cluster tumor cells based on a Bayesian filtering approach of relevant loci and exploiting read overlap and phasing. RESULTS: We developed Single Cell Data Tumor Clusterer (SECEDO, lat. ‘to separate’), a new method to cluster tumor cells based solely on SNVs, inferred on ultra-low coverage single-cell DNA sequencing data. We applied SECEDO to a synthetic dataset simulating 7250 cells and eight tumor subclones from a single patient and were able to accurately reconstruct the clonal composition, detecting 92.11% of the somatic SNVs, with the smallest clusters representing only 6.9% of the total population. When applied to five real single-cell sequencing datasets from a breast cancer patient, each consisting of [Formula: see text] 2000 cells, SECEDO was able to recover the major clonal composition in each dataset at the original coverage of 0.03×, achieving an Adjusted Rand Index (ARI) score of [Formula: see text] 0.6. The current state-of-the-art SNV-based clustering method achieved an ARI score of [Formula: see text] 0, even after merging cells to create higher coverage data (factor 10 increase), and was only able to match SECEDOs performance when pooling data from all five datasets, in addition to artificially increasing the sequencing coverage by a factor of 7. Variant calling on the resulting clusters recovered more than twice as many SNVs as would have been detected if calling on all cells together. Further, the allelic ratio of the called SNVs on each subcluster was more than double relative to the allelic ratio of the SNVs called without clustering, thus demonstrating that calling variants on subclones, in addition to both increasing sensitivity of SNV detection and attaching SNVs to subclones, significantly increases the confidence of the called variants. AVAILABILITY AND IMPLEMENTATION: SECEDO is implemented in C++ and is publicly available at https://github.com/ratschlab/secedo. Instructions to download the data and the evaluation code to reproduce the findings in this paper are available at: https://github.com/ratschlab/secedo-evaluation. The code and data of the submitted version are archived at: https://doi.org/10.5281/zenodo.6516955. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9477524
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-94775242022-09-19 SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing Rozhoňová, Hana Danciu, Daniel Stark, Stefan Rätsch, Gunnar Kahles, André Lehmann, Kjong-Van Bioinformatics Original Papers MOTIVATION: Several recently developed single-cell DNA sequencing technologies enable whole-genome sequencing of thousands of cells. However, the ultra-low coverage of the sequenced data (<0.05× per cell) mostly limits their usage to the identification of copy number alterations in multi-megabase segments. Many tumors are not copy number-driven, and thus single-nucleotide variant (SNV)-based subclone detection may contribute to a more comprehensive view on intra-tumor heterogeneity. Due to the low coverage of the data, the identification of SNVs is only possible when superimposing the sequenced genomes of hundreds of genetically similar cells. Thus, we have developed a new approach to efficiently cluster tumor cells based on a Bayesian filtering approach of relevant loci and exploiting read overlap and phasing. RESULTS: We developed Single Cell Data Tumor Clusterer (SECEDO, lat. ‘to separate’), a new method to cluster tumor cells based solely on SNVs, inferred on ultra-low coverage single-cell DNA sequencing data. We applied SECEDO to a synthetic dataset simulating 7250 cells and eight tumor subclones from a single patient and were able to accurately reconstruct the clonal composition, detecting 92.11% of the somatic SNVs, with the smallest clusters representing only 6.9% of the total population. When applied to five real single-cell sequencing datasets from a breast cancer patient, each consisting of [Formula: see text] 2000 cells, SECEDO was able to recover the major clonal composition in each dataset at the original coverage of 0.03×, achieving an Adjusted Rand Index (ARI) score of [Formula: see text] 0.6. The current state-of-the-art SNV-based clustering method achieved an ARI score of [Formula: see text] 0, even after merging cells to create higher coverage data (factor 10 increase), and was only able to match SECEDOs performance when pooling data from all five datasets, in addition to artificially increasing the sequencing coverage by a factor of 7. Variant calling on the resulting clusters recovered more than twice as many SNVs as would have been detected if calling on all cells together. Further, the allelic ratio of the called SNVs on each subcluster was more than double relative to the allelic ratio of the SNVs called without clustering, thus demonstrating that calling variants on subclones, in addition to both increasing sensitivity of SNV detection and attaching SNVs to subclones, significantly increases the confidence of the called variants. AVAILABILITY AND IMPLEMENTATION: SECEDO is implemented in C++ and is publicly available at https://github.com/ratschlab/secedo. Instructions to download the data and the evaluation code to reproduce the findings in this paper are available at: https://github.com/ratschlab/secedo-evaluation. The code and data of the submitted version are archived at: https://doi.org/10.5281/zenodo.6516955. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-07-28 /pmc/articles/PMC9477524/ /pubmed/35900151 http://dx.doi.org/10.1093/bioinformatics/btac510 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Rozhoňová, Hana
Danciu, Daniel
Stark, Stefan
Rätsch, Gunnar
Kahles, André
Lehmann, Kjong-Van
SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing
title SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing
title_full SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing
title_fullStr SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing
title_full_unstemmed SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing
title_short SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing
title_sort secedo: snv-based subclone detection using ultra-low coverage single-cell dna sequencing
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9477524/
https://www.ncbi.nlm.nih.gov/pubmed/35900151
http://dx.doi.org/10.1093/bioinformatics/btac510
work_keys_str_mv AT rozhonovahana secedosnvbasedsubclonedetectionusingultralowcoveragesinglecelldnasequencing
AT danciudaniel secedosnvbasedsubclonedetectionusingultralowcoveragesinglecelldnasequencing
AT starkstefan secedosnvbasedsubclonedetectionusingultralowcoveragesinglecelldnasequencing
AT ratschgunnar secedosnvbasedsubclonedetectionusingultralowcoveragesinglecelldnasequencing
AT kahlesandre secedosnvbasedsubclonedetectionusingultralowcoveragesinglecelldnasequencing
AT lehmannkjongvan secedosnvbasedsubclonedetectionusingultralowcoveragesinglecelldnasequencing