Cargando…

SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery

Today’s genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a new unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), an approach that directly analyzes raw sequencing data to detect a signature of regulation:...

Descripción completa

Detalles Bibliográficos
Autores principales: Chaung, Kaitlin, Baharav, Tavor Z., Henderson, George, Zheludev, Ivan N., Wang, Peter L., Salzman, Julia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9258296/
https://www.ncbi.nlm.nih.gov/pubmed/35794890
http://dx.doi.org/10.1101/2022.06.24.497555
_version_ 1784741518120058880
author Chaung, Kaitlin
Baharav, Tavor Z.
Henderson, George
Zheludev, Ivan N.
Wang, Peter L.
Salzman, Julia
author_facet Chaung, Kaitlin
Baharav, Tavor Z.
Henderson, George
Zheludev, Ivan N.
Wang, Peter L.
Salzman, Julia
author_sort Chaung, Kaitlin
collection PubMed
description Today’s genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a new unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), an approach that directly analyzes raw sequencing data to detect a signature of regulation: sample-specific sequence variation. The approach, which includes a new statistical test, is computationally efficient and can be run at scale. SPLASH unifies detection of myriad forms of sequence variation. We demonstrate that SPLASH identifies complex mutation patterns in SARS-CoV-2 strains, discovers regulated RNA isoforms at the single cell level, documents the vast sequence diversity of adaptive immune receptors, and uncovers biology in non-model organisms undocumented in their reference genomes: geographic and seasonal variation and diatom association in eelgrass, an oceanic plant impacted by climate change, and tissue-specific transcripts in octopus. SPLASH is a new unifying approach to genomic analysis that enables an expansive scope of discovery without metadata or references.
format Online
Article
Text
id pubmed-9258296
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-92582962022-07-07 SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery Chaung, Kaitlin Baharav, Tavor Z. Henderson, George Zheludev, Ivan N. Wang, Peter L. Salzman, Julia bioRxiv Article Today’s genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a new unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), an approach that directly analyzes raw sequencing data to detect a signature of regulation: sample-specific sequence variation. The approach, which includes a new statistical test, is computationally efficient and can be run at scale. SPLASH unifies detection of myriad forms of sequence variation. We demonstrate that SPLASH identifies complex mutation patterns in SARS-CoV-2 strains, discovers regulated RNA isoforms at the single cell level, documents the vast sequence diversity of adaptive immune receptors, and uncovers biology in non-model organisms undocumented in their reference genomes: geographic and seasonal variation and diatom association in eelgrass, an oceanic plant impacted by climate change, and tissue-specific transcripts in octopus. SPLASH is a new unifying approach to genomic analysis that enables an expansive scope of discovery without metadata or references. Cold Spring Harbor Laboratory 2023-07-31 /pmc/articles/PMC9258296/ /pubmed/35794890 http://dx.doi.org/10.1101/2022.06.24.497555 Text en https://creativecommons.org/licenses/by-nc/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Chaung, Kaitlin
Baharav, Tavor Z.
Henderson, George
Zheludev, Ivan N.
Wang, Peter L.
Salzman, Julia
SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery
title SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery
title_full SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery
title_fullStr SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery
title_full_unstemmed SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery
title_short SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery
title_sort splash: a statistical, reference-free genomic algorithm unifies biological discovery
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9258296/
https://www.ncbi.nlm.nih.gov/pubmed/35794890
http://dx.doi.org/10.1101/2022.06.24.497555
work_keys_str_mv AT chaungkaitlin splashastatisticalreferencefreegenomicalgorithmunifiesbiologicaldiscovery
AT baharavtavorz splashastatisticalreferencefreegenomicalgorithmunifiesbiologicaldiscovery
AT hendersongeorge splashastatisticalreferencefreegenomicalgorithmunifiesbiologicaldiscovery
AT zheludevivann splashastatisticalreferencefreegenomicalgorithmunifiesbiologicaldiscovery
AT wangpeterl splashastatisticalreferencefreegenomicalgorithmunifiesbiologicaldiscovery
AT salzmanjulia splashastatisticalreferencefreegenomicalgorithmunifiesbiologicaldiscovery