Cargando…

Alignment-free sequence comparison for virus genomes based on location correlation coefficient

Coronaviruses (especially SARS-CoV-2) are characterized by rapid mutation and wide spread. As these characteristics easily lead to global pandemics, studying the evolutionary relationship between viruses is essential for clinical diagnosis. DNA sequencing has played an important role in evolutionary...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Lily, Sun, Siyang, Zhang, Qianyue, Bao, Xiaona, Li, Peter K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Published by Elsevier B.V. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8493760/
https://www.ncbi.nlm.nih.gov/pubmed/34626822
http://dx.doi.org/10.1016/j.meegid.2021.105106
_version_ 1784579182675623936
author He, Lily
Sun, Siyang
Zhang, Qianyue
Bao, Xiaona
Li, Peter K.
author_facet He, Lily
Sun, Siyang
Zhang, Qianyue
Bao, Xiaona
Li, Peter K.
author_sort He, Lily
collection PubMed
description Coronaviruses (especially SARS-CoV-2) are characterized by rapid mutation and wide spread. As these characteristics easily lead to global pandemics, studying the evolutionary relationship between viruses is essential for clinical diagnosis. DNA sequencing has played an important role in evolutionary analysis. Recent alignment-free methods can overcome the problems of traditional alignment-based methods, which consume both time and space. This paper proposes a novel alignment-free method called the correlation coefficient feature vector (CCFV), which defines a correlation measure of the L-step delay of a nucleotide location from its location in the original DNA sequence. The numerical feature is a 16 × L-dimensional numerical vector describing the distribution characteristics of the nucleotide positions in a DNA sequence. The proposed L-step delay correlation measure is interestingly related to some types of L + 1 spaced mers. Unlike traditional gene comparison, our method avoids the computational complexity of multiple sequence alignment, and hence improves the speed of sequence comparison. Our method is applied to evolutionary analysis of the common human viruses including SARS-CoV-2, Dengue virus, Hepatitis B virus, and human rhinovirus and achieves the same or even better results than alignment-based methods. Especially for SARS-CoV-2, our method also confirms that bats are potential intermediate hosts of SARS-CoV-2.
format Online
Article
Text
id pubmed-8493760
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Published by Elsevier B.V.
record_format MEDLINE/PubMed
spelling pubmed-84937602021-10-06 Alignment-free sequence comparison for virus genomes based on location correlation coefficient He, Lily Sun, Siyang Zhang, Qianyue Bao, Xiaona Li, Peter K. Infect Genet Evol Article Coronaviruses (especially SARS-CoV-2) are characterized by rapid mutation and wide spread. As these characteristics easily lead to global pandemics, studying the evolutionary relationship between viruses is essential for clinical diagnosis. DNA sequencing has played an important role in evolutionary analysis. Recent alignment-free methods can overcome the problems of traditional alignment-based methods, which consume both time and space. This paper proposes a novel alignment-free method called the correlation coefficient feature vector (CCFV), which defines a correlation measure of the L-step delay of a nucleotide location from its location in the original DNA sequence. The numerical feature is a 16 × L-dimensional numerical vector describing the distribution characteristics of the nucleotide positions in a DNA sequence. The proposed L-step delay correlation measure is interestingly related to some types of L + 1 spaced mers. Unlike traditional gene comparison, our method avoids the computational complexity of multiple sequence alignment, and hence improves the speed of sequence comparison. Our method is applied to evolutionary analysis of the common human viruses including SARS-CoV-2, Dengue virus, Hepatitis B virus, and human rhinovirus and achieves the same or even better results than alignment-based methods. Especially for SARS-CoV-2, our method also confirms that bats are potential intermediate hosts of SARS-CoV-2. Published by Elsevier B.V. 2021-12 2021-10-06 /pmc/articles/PMC8493760/ /pubmed/34626822 http://dx.doi.org/10.1016/j.meegid.2021.105106 Text en © 2021 Published by Elsevier B.V. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
He, Lily
Sun, Siyang
Zhang, Qianyue
Bao, Xiaona
Li, Peter K.
Alignment-free sequence comparison for virus genomes based on location correlation coefficient
title Alignment-free sequence comparison for virus genomes based on location correlation coefficient
title_full Alignment-free sequence comparison for virus genomes based on location correlation coefficient
title_fullStr Alignment-free sequence comparison for virus genomes based on location correlation coefficient
title_full_unstemmed Alignment-free sequence comparison for virus genomes based on location correlation coefficient
title_short Alignment-free sequence comparison for virus genomes based on location correlation coefficient
title_sort alignment-free sequence comparison for virus genomes based on location correlation coefficient
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8493760/
https://www.ncbi.nlm.nih.gov/pubmed/34626822
http://dx.doi.org/10.1016/j.meegid.2021.105106
work_keys_str_mv AT helily alignmentfreesequencecomparisonforvirusgenomesbasedonlocationcorrelationcoefficient
AT sunsiyang alignmentfreesequencecomparisonforvirusgenomesbasedonlocationcorrelationcoefficient
AT zhangqianyue alignmentfreesequencecomparisonforvirusgenomesbasedonlocationcorrelationcoefficient
AT baoxiaona alignmentfreesequencecomparisonforvirusgenomesbasedonlocationcorrelationcoefficient
AT lipeterk alignmentfreesequencecomparisonforvirusgenomesbasedonlocationcorrelationcoefficient