Cargando…

Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult

Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of...

Descripción completa

Detalles Bibliográficos
Autores principales: Morel, Benoit, Barbera, Pierre, Czech, Lucas, Bettisworth, Ben, Hübner, Lukas, Lutteropp, Sarah, Serdari, Dora, Kostaki, Evangelia-Georgia, Mamais, Ioannis, Kozlov, Alexey M, Pavlidis, Pavlos, Paraskevis, Dimitrios, Stamatakis, Alexandros
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7798910/
https://www.ncbi.nlm.nih.gov/pubmed/33316067
http://dx.doi.org/10.1093/molbev/msaa314
_version_ 1783635085685161984
author Morel, Benoit
Barbera, Pierre
Czech, Lucas
Bettisworth, Ben
Hübner, Lukas
Lutteropp, Sarah
Serdari, Dora
Kostaki, Evangelia-Georgia
Mamais, Ioannis
Kozlov, Alexey M
Pavlidis, Pavlos
Paraskevis, Dimitrios
Stamatakis, Alexandros
author_facet Morel, Benoit
Barbera, Pierre
Czech, Lucas
Bettisworth, Ben
Hübner, Lukas
Lutteropp, Sarah
Serdari, Dora
Kostaki, Evangelia-Georgia
Mamais, Ioannis
Kozlov, Alexey M
Pavlidis, Pavlos
Paraskevis, Dimitrios
Stamatakis, Alexandros
author_sort Morel, Benoit
collection PubMed
description Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of 8,736 out of all 16,453 virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be credible. Finally, an automatic classification of the current sequences into subclasses using the mPTP tool for molecular species delimitation is also, as might be expected, not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution.
format Online
Article
Text
id pubmed-7798910
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77989102021-01-25 Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult Morel, Benoit Barbera, Pierre Czech, Lucas Bettisworth, Ben Hübner, Lukas Lutteropp, Sarah Serdari, Dora Kostaki, Evangelia-Georgia Mamais, Ioannis Kozlov, Alexey M Pavlidis, Pavlos Paraskevis, Dimitrios Stamatakis, Alexandros Mol Biol Evol Discoveries Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of 8,736 out of all 16,453 virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be credible. Finally, an automatic classification of the current sequences into subclasses using the mPTP tool for molecular species delimitation is also, as might be expected, not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution. Oxford University Press 2020-12-15 /pmc/articles/PMC7798910/ /pubmed/33316067 http://dx.doi.org/10.1093/molbev/msaa314 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Discoveries
Morel, Benoit
Barbera, Pierre
Czech, Lucas
Bettisworth, Ben
Hübner, Lukas
Lutteropp, Sarah
Serdari, Dora
Kostaki, Evangelia-Georgia
Mamais, Ioannis
Kozlov, Alexey M
Pavlidis, Pavlos
Paraskevis, Dimitrios
Stamatakis, Alexandros
Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult
title Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult
title_full Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult
title_fullStr Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult
title_full_unstemmed Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult
title_short Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult
title_sort phylogenetic analysis of sars-cov-2 data is difficult
topic Discoveries
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7798910/
https://www.ncbi.nlm.nih.gov/pubmed/33316067
http://dx.doi.org/10.1093/molbev/msaa314
work_keys_str_mv AT morelbenoit phylogeneticanalysisofsarscov2dataisdifficult
AT barberapierre phylogeneticanalysisofsarscov2dataisdifficult
AT czechlucas phylogeneticanalysisofsarscov2dataisdifficult
AT bettisworthben phylogeneticanalysisofsarscov2dataisdifficult
AT hubnerlukas phylogeneticanalysisofsarscov2dataisdifficult
AT lutteroppsarah phylogeneticanalysisofsarscov2dataisdifficult
AT serdaridora phylogeneticanalysisofsarscov2dataisdifficult
AT kostakievangeliageorgia phylogeneticanalysisofsarscov2dataisdifficult
AT mamaisioannis phylogeneticanalysisofsarscov2dataisdifficult
AT kozlovalexeym phylogeneticanalysisofsarscov2dataisdifficult
AT pavlidispavlos phylogeneticanalysisofsarscov2dataisdifficult
AT paraskevisdimitrios phylogeneticanalysisofsarscov2dataisdifficult
AT stamatakisalexandros phylogeneticanalysisofsarscov2dataisdifficult