Cargando…
SPLASH2 provides ultra-efficient, scalable, and unsupervised discovery on raw sequencing reads
SPLASH is an unsupervised, reference-free, and unifying algorithm that discovers regulated sequence variation through statistical analysis of k-mer composition, subsuming many application-specific algorithms. Here, we introduce SPLASH2, a fast, scalable implementation of SPLASH based on an efficient...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055302/ https://www.ncbi.nlm.nih.gov/pubmed/36993432 http://dx.doi.org/10.1101/2023.03.17.533189 |
Sumario: | SPLASH is an unsupervised, reference-free, and unifying algorithm that discovers regulated sequence variation through statistical analysis of k-mer composition, subsuming many application-specific algorithms. Here, we introduce SPLASH2, a fast, scalable implementation of SPLASH based on an efficient k-mer counting approach. The pipeline has minimal installation requirements, and can be executed with a single command. SPLASH2 enables efficient analysis of massive datasets from a wide range of sequencing technologies and biological contexts at unmatched scale and speed, showcased by revealing new biology in rapid analysis of single-cell RNA-sequencing data from human muscle cells, and bulk RNA-seq from the entire Cancer Cell Line Encyclopedia (CCLE) and a study of Amyotrophic Lateral Sclerosis. |
---|