Cargando…

Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression

We analyze the whole genome phylogeny and taxonomy of the SARS-CoV-2 virus using compression. This is a new fast alignment-free method called the “normalized compression distance” (NCD) method. It discovers all effective similarities based on Kolmogorov complexity. The latter being incomputable we a...

Descripción completa

Detalles Bibliográficos
Autores principales: Cilibrasi, Rudi L., Vitányi, Paul M.B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8132223/
https://www.ncbi.nlm.nih.gov/pubmed/34013267
http://dx.doi.org/10.1101/2020.07.22.216242
_version_ 1783694874018578432
author Cilibrasi, Rudi L.
Vitányi, Paul M.B.
author_facet Cilibrasi, Rudi L.
Vitányi, Paul M.B.
author_sort Cilibrasi, Rudi L.
collection PubMed
description We analyze the whole genome phylogeny and taxonomy of the SARS-CoV-2 virus using compression. This is a new fast alignment-free method called the “normalized compression distance” (NCD) method. It discovers all effective similarities based on Kolmogorov complexity. The latter being incomputable we approximate it by a good compressor such as the modern zpaq. The results comprise that the SARS-CoV-2 virus is closest to the RaTG13 virus and similar to two bat SARS-like coronaviruses bat-SL-CoVZXC21 and bat-SL-CoVZC4. The similarity is quantified and compared with the same quantified similarities among the mtDNA of certain species. We treat the question whether Pangolins are involved in the SARS-CoV-2 virus. The compression method is simpler and possibly faster than any other whole genome method, which makes it the ideal tool to explore phylogeny.
format Online
Article
Text
id pubmed-8132223
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-81322232021-05-20 Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression Cilibrasi, Rudi L. Vitányi, Paul M.B. bioRxiv Article We analyze the whole genome phylogeny and taxonomy of the SARS-CoV-2 virus using compression. This is a new fast alignment-free method called the “normalized compression distance” (NCD) method. It discovers all effective similarities based on Kolmogorov complexity. The latter being incomputable we approximate it by a good compressor such as the modern zpaq. The results comprise that the SARS-CoV-2 virus is closest to the RaTG13 virus and similar to two bat SARS-like coronaviruses bat-SL-CoVZXC21 and bat-SL-CoVZC4. The similarity is quantified and compared with the same quantified similarities among the mtDNA of certain species. We treat the question whether Pangolins are involved in the SARS-CoV-2 virus. The compression method is simpler and possibly faster than any other whole genome method, which makes it the ideal tool to explore phylogeny. Cold Spring Harbor Laboratory 2020-08-26 /pmc/articles/PMC8132223/ /pubmed/34013267 http://dx.doi.org/10.1101/2020.07.22.216242 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Cilibrasi, Rudi L.
Vitányi, Paul M.B.
Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression
title Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression
title_full Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression
title_fullStr Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression
title_full_unstemmed Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression
title_short Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression
title_sort fast whole-genome phylogeny of the covid-19 virus sars-cov-2 by compression
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8132223/
https://www.ncbi.nlm.nih.gov/pubmed/34013267
http://dx.doi.org/10.1101/2020.07.22.216242
work_keys_str_mv AT cilibrasirudil fastwholegenomephylogenyofthecovid19virussarscov2bycompression
AT vitanyipaulmb fastwholegenomephylogenyofthecovid19virussarscov2bycompression