Cargando…

Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity

The COVID-19 pandemic has led to a rapid accumulation of SARS-CoV-2 genomes, enabling genomic epidemiology on local and global scales. Collections of genomes from resources such as GISAID must be subsampled to enable computationally feasible phylogenetic and other analyses. We present genome-sampler...

Descripción completa

Detalles Bibliográficos
Autores principales: Bolyen, Evan, Dillon, Matthew R., Bokulich, Nicholas A., Ladner, Jason T., Larsen, Brendan B., Hepp, Crystal M., Lemmer, Darrin, Sahl, Jason W., Sanchez, Andrew, Holdgraf, Chris, Sewell, Chris, Choudhury, Aakash G., Stachurski, John, McKay, Matthew, Simard, Anthony, Engelthaler, David M., Worobey, Michael, Keim, Paul, Caporaso, J. Gregory
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7814287/
https://www.ncbi.nlm.nih.gov/pubmed/33500774
http://dx.doi.org/10.12688/f1000research.24751.2
_version_ 1783638034625855488
author Bolyen, Evan
Dillon, Matthew R.
Bokulich, Nicholas A.
Ladner, Jason T.
Larsen, Brendan B.
Hepp, Crystal M.
Lemmer, Darrin
Sahl, Jason W.
Sanchez, Andrew
Holdgraf, Chris
Sewell, Chris
Choudhury, Aakash G.
Stachurski, John
McKay, Matthew
Simard, Anthony
Engelthaler, David M.
Worobey, Michael
Keim, Paul
Caporaso, J. Gregory
author_facet Bolyen, Evan
Dillon, Matthew R.
Bokulich, Nicholas A.
Ladner, Jason T.
Larsen, Brendan B.
Hepp, Crystal M.
Lemmer, Darrin
Sahl, Jason W.
Sanchez, Andrew
Holdgraf, Chris
Sewell, Chris
Choudhury, Aakash G.
Stachurski, John
McKay, Matthew
Simard, Anthony
Engelthaler, David M.
Worobey, Michael
Keim, Paul
Caporaso, J. Gregory
author_sort Bolyen, Evan
collection PubMed
description The COVID-19 pandemic has led to a rapid accumulation of SARS-CoV-2 genomes, enabling genomic epidemiology on local and global scales. Collections of genomes from resources such as GISAID must be subsampled to enable computationally feasible phylogenetic and other analyses. We present genome-sampler, a software package that supports sampling collections of viral genomes across multiple axes including time of genome isolation, location of genome isolation, and viral diversity. The software is modular in design so that these or future sampling approaches can be applied independently and combined (or replaced with a random sampling approach) to facilitate custom workflows and benchmarking. genome-sampler is written as a QIIME 2 plugin, ensuring that its application is fully reproducible through QIIME 2’s unique retrospective data provenance tracking system. genome-sampler can be installed in a conda environment on macOS or Linux systems. A complete default pipeline is available through a Snakemake workflow, so subsampling can be achieved using a single command. genome-sampler is open source, free for all to use, and available at https://caporasolab.us/genome-sampler. We hope that this will facilitate SARS-CoV-2 research and support evaluation of viral genome sampling approaches for genomic epidemiology.
format Online
Article
Text
id pubmed-7814287
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-78142872021-01-25 Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity Bolyen, Evan Dillon, Matthew R. Bokulich, Nicholas A. Ladner, Jason T. Larsen, Brendan B. Hepp, Crystal M. Lemmer, Darrin Sahl, Jason W. Sanchez, Andrew Holdgraf, Chris Sewell, Chris Choudhury, Aakash G. Stachurski, John McKay, Matthew Simard, Anthony Engelthaler, David M. Worobey, Michael Keim, Paul Caporaso, J. Gregory F1000Res Software Tool Article The COVID-19 pandemic has led to a rapid accumulation of SARS-CoV-2 genomes, enabling genomic epidemiology on local and global scales. Collections of genomes from resources such as GISAID must be subsampled to enable computationally feasible phylogenetic and other analyses. We present genome-sampler, a software package that supports sampling collections of viral genomes across multiple axes including time of genome isolation, location of genome isolation, and viral diversity. The software is modular in design so that these or future sampling approaches can be applied independently and combined (or replaced with a random sampling approach) to facilitate custom workflows and benchmarking. genome-sampler is written as a QIIME 2 plugin, ensuring that its application is fully reproducible through QIIME 2’s unique retrospective data provenance tracking system. genome-sampler can be installed in a conda environment on macOS or Linux systems. A complete default pipeline is available through a Snakemake workflow, so subsampling can be achieved using a single command. genome-sampler is open source, free for all to use, and available at https://caporasolab.us/genome-sampler. We hope that this will facilitate SARS-CoV-2 research and support evaluation of viral genome sampling approaches for genomic epidemiology. F1000 Research Limited 2020-10-28 /pmc/articles/PMC7814287/ /pubmed/33500774 http://dx.doi.org/10.12688/f1000research.24751.2 Text en Copyright: © 2020 Bolyen E et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Tool Article
Bolyen, Evan
Dillon, Matthew R.
Bokulich, Nicholas A.
Ladner, Jason T.
Larsen, Brendan B.
Hepp, Crystal M.
Lemmer, Darrin
Sahl, Jason W.
Sanchez, Andrew
Holdgraf, Chris
Sewell, Chris
Choudhury, Aakash G.
Stachurski, John
McKay, Matthew
Simard, Anthony
Engelthaler, David M.
Worobey, Michael
Keim, Paul
Caporaso, J. Gregory
Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity
title Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity
title_full Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity
title_fullStr Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity
title_full_unstemmed Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity
title_short Reproducibly sampling SARS-CoV-2 genomes across time, geography, and viral diversity
title_sort reproducibly sampling sars-cov-2 genomes across time, geography, and viral diversity
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7814287/
https://www.ncbi.nlm.nih.gov/pubmed/33500774
http://dx.doi.org/10.12688/f1000research.24751.2
work_keys_str_mv AT bolyenevan reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT dillonmatthewr reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT bokulichnicholasa reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT ladnerjasont reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT larsenbrendanb reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT heppcrystalm reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT lemmerdarrin reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT sahljasonw reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT sanchezandrew reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT holdgrafchris reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT sewellchris reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT choudhuryaakashg reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT stachurskijohn reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT mckaymatthew reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT simardanthony reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT engelthalerdavidm reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT worobeymichael reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT keimpaul reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity
AT caporasojgregory reproduciblysamplingsarscov2genomesacrosstimegeographyandviraldiversity