Cargando…

Stability of SARS-CoV-2 phylogenies

The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host vi...

Descripción completa

Detalles Bibliográficos
Autores principales: Turakhia, Yatish, De Maio, Nicola, Thornlow, Bryan, Gozashti, Landen, Lanfear, Robert, Walker, Conor R., Hinrichs, Angie S., Fernandes, Jason D., Borges, Rui, Slodkowicz, Greg, Weilguny, Lukas, Haussler, David, Goldman, Nick, Corbett-Detig, Russell
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7721162/
https://www.ncbi.nlm.nih.gov/pubmed/33206635
http://dx.doi.org/10.1371/journal.pgen.1009175
_version_ 1783619988279525376
author Turakhia, Yatish
De Maio, Nicola
Thornlow, Bryan
Gozashti, Landen
Lanfear, Robert
Walker, Conor R.
Hinrichs, Angie S.
Fernandes, Jason D.
Borges, Rui
Slodkowicz, Greg
Weilguny, Lukas
Haussler, David
Goldman, Nick
Corbett-Detig, Russell
author_facet Turakhia, Yatish
De Maio, Nicola
Thornlow, Bryan
Gozashti, Landen
Lanfear, Robert
Walker, Conor R.
Hinrichs, Angie S.
Fernandes, Jason D.
Borges, Rui
Slodkowicz, Greg
Weilguny, Lukas
Haussler, David
Goldman, Nick
Corbett-Detig, Russell
author_sort Turakhia, Yatish
collection PubMed
description The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab—or protocol—specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse.
format Online
Article
Text
id pubmed-7721162
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-77211622020-12-15 Stability of SARS-CoV-2 phylogenies Turakhia, Yatish De Maio, Nicola Thornlow, Bryan Gozashti, Landen Lanfear, Robert Walker, Conor R. Hinrichs, Angie S. Fernandes, Jason D. Borges, Rui Slodkowicz, Greg Weilguny, Lukas Haussler, David Goldman, Nick Corbett-Detig, Russell PLoS Genet Research Article The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab—or protocol—specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse. Public Library of Science 2020-11-18 /pmc/articles/PMC7721162/ /pubmed/33206635 http://dx.doi.org/10.1371/journal.pgen.1009175 Text en © 2020 Turakhia et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Turakhia, Yatish
De Maio, Nicola
Thornlow, Bryan
Gozashti, Landen
Lanfear, Robert
Walker, Conor R.
Hinrichs, Angie S.
Fernandes, Jason D.
Borges, Rui
Slodkowicz, Greg
Weilguny, Lukas
Haussler, David
Goldman, Nick
Corbett-Detig, Russell
Stability of SARS-CoV-2 phylogenies
title Stability of SARS-CoV-2 phylogenies
title_full Stability of SARS-CoV-2 phylogenies
title_fullStr Stability of SARS-CoV-2 phylogenies
title_full_unstemmed Stability of SARS-CoV-2 phylogenies
title_short Stability of SARS-CoV-2 phylogenies
title_sort stability of sars-cov-2 phylogenies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7721162/
https://www.ncbi.nlm.nih.gov/pubmed/33206635
http://dx.doi.org/10.1371/journal.pgen.1009175
work_keys_str_mv AT turakhiayatish stabilityofsarscov2phylogenies
AT demaionicola stabilityofsarscov2phylogenies
AT thornlowbryan stabilityofsarscov2phylogenies
AT gozashtilanden stabilityofsarscov2phylogenies
AT lanfearrobert stabilityofsarscov2phylogenies
AT walkerconorr stabilityofsarscov2phylogenies
AT hinrichsangies stabilityofsarscov2phylogenies
AT fernandesjasond stabilityofsarscov2phylogenies
AT borgesrui stabilityofsarscov2phylogenies
AT slodkowiczgreg stabilityofsarscov2phylogenies
AT weilgunylukas stabilityofsarscov2phylogenies
AT hausslerdavid stabilityofsarscov2phylogenies
AT goldmannick stabilityofsarscov2phylogenies
AT corbettdetigrussell stabilityofsarscov2phylogenies