Cargando…

Positional Correlation Natural Vector: A Novel Method for Genome Comparison

Advances in sequencing technology have made large amounts of biological data available. Evolutionary analysis of data such as DNA sequences is highly important in biological studies. As alignment methods are ineffective for analyzing large-scale data due to their inherently high costs, alignment-fre...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Lily, Dong, Rui, He, Rong Lucy, Yau, Stephen S.-T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7312176/
https://www.ncbi.nlm.nih.gov/pubmed/32485813
http://dx.doi.org/10.3390/ijms21113859
_version_ 1783549670999457792
author He, Lily
Dong, Rui
He, Rong Lucy
Yau, Stephen S.-T.
author_facet He, Lily
Dong, Rui
He, Rong Lucy
Yau, Stephen S.-T.
author_sort He, Lily
collection PubMed
description Advances in sequencing technology have made large amounts of biological data available. Evolutionary analysis of data such as DNA sequences is highly important in biological studies. As alignment methods are ineffective for analyzing large-scale data due to their inherently high costs, alignment-free methods have recently attracted attention in the field of bioinformatics. In this paper, we introduce a new positional correlation natural vector (PCNV) method that involves converting a DNA sequence into an 18-dimensional numerical feature vector. Using frequency and position correlation to represent the nucleotide distribution, it is possible to obtain a PCNV for a DNA sequence. This new numerical vector design uses six suitable features to characterize the correlation among nucleotide positions in sequences. PCNV is also very easy to compute and can be used for rapid genome comparison. To test our novel method, we performed phylogenetic analysis with several viral and bacterial genome datasets with PCNV. For comparison, an alignment-based method, Bayesian inference, and two alignment-free methods, feature frequency profile and natural vector, were performed using the same datasets. We found that the PCNV technique is fast and accurate when used for phylogenetic analysis and classification of viruses and bacteria.
format Online
Article
Text
id pubmed-7312176
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-73121762020-06-26 Positional Correlation Natural Vector: A Novel Method for Genome Comparison He, Lily Dong, Rui He, Rong Lucy Yau, Stephen S.-T. Int J Mol Sci Article Advances in sequencing technology have made large amounts of biological data available. Evolutionary analysis of data such as DNA sequences is highly important in biological studies. As alignment methods are ineffective for analyzing large-scale data due to their inherently high costs, alignment-free methods have recently attracted attention in the field of bioinformatics. In this paper, we introduce a new positional correlation natural vector (PCNV) method that involves converting a DNA sequence into an 18-dimensional numerical feature vector. Using frequency and position correlation to represent the nucleotide distribution, it is possible to obtain a PCNV for a DNA sequence. This new numerical vector design uses six suitable features to characterize the correlation among nucleotide positions in sequences. PCNV is also very easy to compute and can be used for rapid genome comparison. To test our novel method, we performed phylogenetic analysis with several viral and bacterial genome datasets with PCNV. For comparison, an alignment-based method, Bayesian inference, and two alignment-free methods, feature frequency profile and natural vector, were performed using the same datasets. We found that the PCNV technique is fast and accurate when used for phylogenetic analysis and classification of viruses and bacteria. MDPI 2020-05-29 /pmc/articles/PMC7312176/ /pubmed/32485813 http://dx.doi.org/10.3390/ijms21113859 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
He, Lily
Dong, Rui
He, Rong Lucy
Yau, Stephen S.-T.
Positional Correlation Natural Vector: A Novel Method for Genome Comparison
title Positional Correlation Natural Vector: A Novel Method for Genome Comparison
title_full Positional Correlation Natural Vector: A Novel Method for Genome Comparison
title_fullStr Positional Correlation Natural Vector: A Novel Method for Genome Comparison
title_full_unstemmed Positional Correlation Natural Vector: A Novel Method for Genome Comparison
title_short Positional Correlation Natural Vector: A Novel Method for Genome Comparison
title_sort positional correlation natural vector: a novel method for genome comparison
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7312176/
https://www.ncbi.nlm.nih.gov/pubmed/32485813
http://dx.doi.org/10.3390/ijms21113859
work_keys_str_mv AT helily positionalcorrelationnaturalvectoranovelmethodforgenomecomparison
AT dongrui positionalcorrelationnaturalvectoranovelmethodforgenomecomparison
AT heronglucy positionalcorrelationnaturalvectoranovelmethodforgenomecomparison
AT yaustephenst positionalcorrelationnaturalvectoranovelmethodforgenomecomparison