Cargando…

RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis

An important unmet need revealed by the COVID-19 pandemic is the near-real-time identification of potentially fitness-altering mutations within rapidly growing SARS-CoV-2 lineages. Although powerful molecular sequence analysis methods are available to detect and characterize patterns of natural sele...

Descripción completa

Detalles Bibliográficos
Autores principales: Lucaci, Alexander G., Zehr, Jordan D., Shank, Stephen D., Bouvier, Dave, Ostrovsky, Alexander, Mei, Han, Nekrutenko, Anton, Martin, Darren P., Kosakovsky Pond, Sergei L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9629619/
https://www.ncbi.nlm.nih.gov/pubmed/36322581
http://dx.doi.org/10.1371/journal.pone.0275623
_version_ 1784823436432900096
author Lucaci, Alexander G.
Zehr, Jordan D.
Shank, Stephen D.
Bouvier, Dave
Ostrovsky, Alexander
Mei, Han
Nekrutenko, Anton
Martin, Darren P.
Kosakovsky Pond, Sergei L.
author_facet Lucaci, Alexander G.
Zehr, Jordan D.
Shank, Stephen D.
Bouvier, Dave
Ostrovsky, Alexander
Mei, Han
Nekrutenko, Anton
Martin, Darren P.
Kosakovsky Pond, Sergei L.
author_sort Lucaci, Alexander G.
collection PubMed
description An important unmet need revealed by the COVID-19 pandemic is the near-real-time identification of potentially fitness-altering mutations within rapidly growing SARS-CoV-2 lineages. Although powerful molecular sequence analysis methods are available to detect and characterize patterns of natural selection within modestly sized gene-sequence datasets, the computational complexity of these methods and their sensitivity to sequencing errors render them effectively inapplicable in large-scale genomic surveillance contexts. Motivated by the need to analyze new lineage evolution in near-real time using large numbers of genomes, we developed the Rapid Assessment of Selection within CLades (RASCL) pipeline. RASCL applies state of the art phylogenetic comparative methods to evaluate selective processes acting at individual codon sites and across whole genes. RASCL is scalable and produces automatically updated regular lineage-specific selection analysis reports: even for lineages that include tens or hundreds of thousands of sampled genome sequences. Key to this performance is (i) generation of automatically subsampled high quality datasets of gene/ORF sequences drawn from a selected “query” viral lineage; (ii) contextualization of these query sequences in codon alignments that include high-quality “background” sequences representative of global SARS-CoV-2 diversity; and (iii) the extensive parallelization of a suite of computationally intensive selection analysis tests. Within hours of being deployed to analyze a novel rapidly growing lineage of interest, RASCL will begin yielding JavaScript Object Notation (JSON)-formatted reports that can be either imported into third-party analysis software or explored in standard web-browsers using the premade RASCL interactive data visualization dashboard. By enabling the rapid detection of genome sites evolving under different selective regimes, RASCL is well-suited for near-real-time monitoring of the population-level selective processes that will likely underlie the emergence of future variants of concern in measurably evolving pathogens with extensive genomic surveillance.
format Online
Article
Text
id pubmed-9629619
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-96296192022-11-03 RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis Lucaci, Alexander G. Zehr, Jordan D. Shank, Stephen D. Bouvier, Dave Ostrovsky, Alexander Mei, Han Nekrutenko, Anton Martin, Darren P. Kosakovsky Pond, Sergei L. PLoS One Research Article An important unmet need revealed by the COVID-19 pandemic is the near-real-time identification of potentially fitness-altering mutations within rapidly growing SARS-CoV-2 lineages. Although powerful molecular sequence analysis methods are available to detect and characterize patterns of natural selection within modestly sized gene-sequence datasets, the computational complexity of these methods and their sensitivity to sequencing errors render them effectively inapplicable in large-scale genomic surveillance contexts. Motivated by the need to analyze new lineage evolution in near-real time using large numbers of genomes, we developed the Rapid Assessment of Selection within CLades (RASCL) pipeline. RASCL applies state of the art phylogenetic comparative methods to evaluate selective processes acting at individual codon sites and across whole genes. RASCL is scalable and produces automatically updated regular lineage-specific selection analysis reports: even for lineages that include tens or hundreds of thousands of sampled genome sequences. Key to this performance is (i) generation of automatically subsampled high quality datasets of gene/ORF sequences drawn from a selected “query” viral lineage; (ii) contextualization of these query sequences in codon alignments that include high-quality “background” sequences representative of global SARS-CoV-2 diversity; and (iii) the extensive parallelization of a suite of computationally intensive selection analysis tests. Within hours of being deployed to analyze a novel rapidly growing lineage of interest, RASCL will begin yielding JavaScript Object Notation (JSON)-formatted reports that can be either imported into third-party analysis software or explored in standard web-browsers using the premade RASCL interactive data visualization dashboard. By enabling the rapid detection of genome sites evolving under different selective regimes, RASCL is well-suited for near-real-time monitoring of the population-level selective processes that will likely underlie the emergence of future variants of concern in measurably evolving pathogens with extensive genomic surveillance. Public Library of Science 2022-11-02 /pmc/articles/PMC9629619/ /pubmed/36322581 http://dx.doi.org/10.1371/journal.pone.0275623 Text en © 2022 Lucaci et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Lucaci, Alexander G.
Zehr, Jordan D.
Shank, Stephen D.
Bouvier, Dave
Ostrovsky, Alexander
Mei, Han
Nekrutenko, Anton
Martin, Darren P.
Kosakovsky Pond, Sergei L.
RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis
title RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis
title_full RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis
title_fullStr RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis
title_full_unstemmed RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis
title_short RASCL: Rapid Assessment of Selection in CLades through molecular sequence analysis
title_sort rascl: rapid assessment of selection in clades through molecular sequence analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9629619/
https://www.ncbi.nlm.nih.gov/pubmed/36322581
http://dx.doi.org/10.1371/journal.pone.0275623
work_keys_str_mv AT lucacialexanderg rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT zehrjordand rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT shankstephend rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT bouvierdave rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT ostrovskyalexander rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT meihan rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT nekrutenkoanton rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT martindarrenp rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis
AT kosakovskypondsergeil rasclrapidassessmentofselectionincladesthroughmolecularsequenceanalysis