Cargando…
A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees
The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8662617/ https://www.ncbi.nlm.nih.gov/pubmed/34469548 http://dx.doi.org/10.1093/molbev/msab264 |
_version_ | 1784613475823124480 |
---|---|
author | McBroome, Jakob Thornlow, Bryan Hinrichs, Angie S Kramer, Alexander De Maio, Nicola Goldman, Nick Haussler, David Corbett-Detig, Russell Turakhia, Yatish |
author_facet | McBroome, Jakob Thornlow, Bryan Hinrichs, Angie S Kramer, Alexander De Maio, Nicola Goldman, Nick Haussler, David Corbett-Detig, Russell Turakhia, Yatish |
author_sort | McBroome, Jakob |
collection | PubMed |
description | The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus’ evolutionary history using public data. We also present matUtils—a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively. |
format | Online Article Text |
id | pubmed-8662617 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-86626172021-12-10 A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees McBroome, Jakob Thornlow, Bryan Hinrichs, Angie S Kramer, Alexander De Maio, Nicola Goldman, Nick Haussler, David Corbett-Detig, Russell Turakhia, Yatish Mol Biol Evol Resources The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus’ evolutionary history using public data. We also present matUtils—a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively. Oxford University Press 2021-09-01 /pmc/articles/PMC8662617/ /pubmed/34469548 http://dx.doi.org/10.1093/molbev/msab264 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Resources McBroome, Jakob Thornlow, Bryan Hinrichs, Angie S Kramer, Alexander De Maio, Nicola Goldman, Nick Haussler, David Corbett-Detig, Russell Turakhia, Yatish A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees |
title | A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees |
title_full | A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees |
title_fullStr | A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees |
title_full_unstemmed | A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees |
title_short | A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees |
title_sort | daily-updated database and tools for comprehensive sars-cov-2 mutation-annotated trees |
topic | Resources |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8662617/ https://www.ncbi.nlm.nih.gov/pubmed/34469548 http://dx.doi.org/10.1093/molbev/msab264 |
work_keys_str_mv | AT mcbroomejakob adailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT thornlowbryan adailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT hinrichsangies adailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT krameralexander adailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT demaionicola adailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT goldmannick adailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT hausslerdavid adailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT corbettdetigrussell adailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT turakhiayatish adailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT mcbroomejakob dailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT thornlowbryan dailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT hinrichsangies dailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT krameralexander dailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT demaionicola dailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT goldmannick dailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT hausslerdavid dailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT corbettdetigrussell dailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees AT turakhiayatish dailyupdateddatabaseandtoolsforcomprehensivesarscov2mutationannotatedtrees |