Cargando…
Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult
Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7798910/ https://www.ncbi.nlm.nih.gov/pubmed/33316067 http://dx.doi.org/10.1093/molbev/msaa314 |
_version_ | 1783635085685161984 |
---|---|
author | Morel, Benoit Barbera, Pierre Czech, Lucas Bettisworth, Ben Hübner, Lukas Lutteropp, Sarah Serdari, Dora Kostaki, Evangelia-Georgia Mamais, Ioannis Kozlov, Alexey M Pavlidis, Pavlos Paraskevis, Dimitrios Stamatakis, Alexandros |
author_facet | Morel, Benoit Barbera, Pierre Czech, Lucas Bettisworth, Ben Hübner, Lukas Lutteropp, Sarah Serdari, Dora Kostaki, Evangelia-Georgia Mamais, Ioannis Kozlov, Alexey M Pavlidis, Pavlos Paraskevis, Dimitrios Stamatakis, Alexandros |
author_sort | Morel, Benoit |
collection | PubMed |
description | Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of 8,736 out of all 16,453 virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be credible. Finally, an automatic classification of the current sequences into subclasses using the mPTP tool for molecular species delimitation is also, as might be expected, not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution. |
format | Online Article Text |
id | pubmed-7798910 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-77989102021-01-25 Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult Morel, Benoit Barbera, Pierre Czech, Lucas Bettisworth, Ben Hübner, Lukas Lutteropp, Sarah Serdari, Dora Kostaki, Evangelia-Georgia Mamais, Ioannis Kozlov, Alexey M Pavlidis, Pavlos Paraskevis, Dimitrios Stamatakis, Alexandros Mol Biol Evol Discoveries Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of 8,736 out of all 16,453 virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be credible. Finally, an automatic classification of the current sequences into subclasses using the mPTP tool for molecular species delimitation is also, as might be expected, not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution. Oxford University Press 2020-12-15 /pmc/articles/PMC7798910/ /pubmed/33316067 http://dx.doi.org/10.1093/molbev/msaa314 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Discoveries Morel, Benoit Barbera, Pierre Czech, Lucas Bettisworth, Ben Hübner, Lukas Lutteropp, Sarah Serdari, Dora Kostaki, Evangelia-Georgia Mamais, Ioannis Kozlov, Alexey M Pavlidis, Pavlos Paraskevis, Dimitrios Stamatakis, Alexandros Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult |
title | Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult |
title_full | Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult |
title_fullStr | Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult |
title_full_unstemmed | Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult |
title_short | Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult |
title_sort | phylogenetic analysis of sars-cov-2 data is difficult |
topic | Discoveries |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7798910/ https://www.ncbi.nlm.nih.gov/pubmed/33316067 http://dx.doi.org/10.1093/molbev/msaa314 |
work_keys_str_mv | AT morelbenoit phylogeneticanalysisofsarscov2dataisdifficult AT barberapierre phylogeneticanalysisofsarscov2dataisdifficult AT czechlucas phylogeneticanalysisofsarscov2dataisdifficult AT bettisworthben phylogeneticanalysisofsarscov2dataisdifficult AT hubnerlukas phylogeneticanalysisofsarscov2dataisdifficult AT lutteroppsarah phylogeneticanalysisofsarscov2dataisdifficult AT serdaridora phylogeneticanalysisofsarscov2dataisdifficult AT kostakievangeliageorgia phylogeneticanalysisofsarscov2dataisdifficult AT mamaisioannis phylogeneticanalysisofsarscov2dataisdifficult AT kozlovalexeym phylogeneticanalysisofsarscov2dataisdifficult AT pavlidispavlos phylogeneticanalysisofsarscov2dataisdifficult AT paraskevisdimitrios phylogeneticanalysisofsarscov2dataisdifficult AT stamatakisalexandros phylogeneticanalysisofsarscov2dataisdifficult |