Cargando…

SPLASH2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads

SPLASH is an unsupervised, reference-free, and unifying algorithm that discovers regulated sequence variation through statistical analysis of k-mer composition, subsuming many application-specific algorithms. Here, we introduce SPLASH2, a fast, scalable implementation of SPLASH based on an efficient...

Descripción completa

Detalles Bibliográficos
Autores principales: Kokot, Marek, Dehghannasiri, Roozbeh, Baharav, Tavor, Salzman, Julia, Deorowicz, Sebastian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055302/
https://www.ncbi.nlm.nih.gov/pubmed/36993432
http://dx.doi.org/10.1101/2023.03.17.533189
_version_ 1785015852698959872
author Kokot, Marek
Dehghannasiri, Roozbeh
Baharav, Tavor
Salzman, Julia
Deorowicz, Sebastian
author_facet Kokot, Marek
Dehghannasiri, Roozbeh
Baharav, Tavor
Salzman, Julia
Deorowicz, Sebastian
author_sort Kokot, Marek
collection PubMed
description SPLASH is an unsupervised, reference-free, and unifying algorithm that discovers regulated sequence variation through statistical analysis of k-mer composition, subsuming many application-specific algorithms. Here, we introduce SPLASH2, a fast, scalable implementation of SPLASH based on an efficient k-mer counting approach. The pipeline has minimal installation requirements, and can be executed with a single command. SPLASH2 enables efficient analysis of massive datasets from a wide range of sequencing technologies and biological contexts at unmatched scale and speed, showcased by revealing new biology in rapid analysis of single-cell RNA-sequencing data from human muscle cells, and bulk RNA-seq from the entire Cancer Cell Line Encyclopedia (CCLE) and a study of Amyotrophic Lateral Sclerosis.
format Online
Article
Text
id pubmed-10055302
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-100553022023-03-30 SPLASH2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads Kokot, Marek Dehghannasiri, Roozbeh Baharav, Tavor Salzman, Julia Deorowicz, Sebastian bioRxiv Article SPLASH is an unsupervised, reference-free, and unifying algorithm that discovers regulated sequence variation through statistical analysis of k-mer composition, subsuming many application-specific algorithms. Here, we introduce SPLASH2, a fast, scalable implementation of SPLASH based on an efficient k-mer counting approach. The pipeline has minimal installation requirements, and can be executed with a single command. SPLASH2 enables efficient analysis of massive datasets from a wide range of sequencing technologies and biological contexts at unmatched scale and speed, showcased by revealing new biology in rapid analysis of single-cell RNA-sequencing data from human muscle cells, and bulk RNA-seq from the entire Cancer Cell Line Encyclopedia (CCLE) and a study of Amyotrophic Lateral Sclerosis. Cold Spring Harbor Laboratory 2023-07-17 /pmc/articles/PMC10055302/ /pubmed/36993432 http://dx.doi.org/10.1101/2023.03.17.533189 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Kokot, Marek
Dehghannasiri, Roozbeh
Baharav, Tavor
Salzman, Julia
Deorowicz, Sebastian
SPLASH2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads
title SPLASH2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads
title_full SPLASH2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads
title_fullStr SPLASH2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads
title_full_unstemmed SPLASH2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads
title_short SPLASH2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads
title_sort splash2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055302/
https://www.ncbi.nlm.nih.gov/pubmed/36993432
http://dx.doi.org/10.1101/2023.03.17.533189
work_keys_str_mv AT kokotmarek splash2providesultraefficientscalableandunsuperviseddiscoveryonrawsequencingreads
AT dehghannasiriroozbeh splash2providesultraefficientscalableandunsuperviseddiscoveryonrawsequencingreads
AT baharavtavor splash2providesultraefficientscalableandunsuperviseddiscoveryonrawsequencingreads
AT salzmanjulia splash2providesultraefficientscalableandunsuperviseddiscoveryonrawsequencingreads
AT deorowiczsebastian splash2providesultraefficientscalableandunsuperviseddiscoveryonrawsequencingreads