Cargando…
Positional Correlation Natural Vector: A Novel Method for Genome Comparison
Advances in sequencing technology have made large amounts of biological data available. Evolutionary analysis of data such as DNA sequences is highly important in biological studies. As alignment methods are ineffective for analyzing large-scale data due to their inherently high costs, alignment-fre...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7312176/ https://www.ncbi.nlm.nih.gov/pubmed/32485813 http://dx.doi.org/10.3390/ijms21113859 |
_version_ | 1783549670999457792 |
---|---|
author | He, Lily Dong, Rui He, Rong Lucy Yau, Stephen S.-T. |
author_facet | He, Lily Dong, Rui He, Rong Lucy Yau, Stephen S.-T. |
author_sort | He, Lily |
collection | PubMed |
description | Advances in sequencing technology have made large amounts of biological data available. Evolutionary analysis of data such as DNA sequences is highly important in biological studies. As alignment methods are ineffective for analyzing large-scale data due to their inherently high costs, alignment-free methods have recently attracted attention in the field of bioinformatics. In this paper, we introduce a new positional correlation natural vector (PCNV) method that involves converting a DNA sequence into an 18-dimensional numerical feature vector. Using frequency and position correlation to represent the nucleotide distribution, it is possible to obtain a PCNV for a DNA sequence. This new numerical vector design uses six suitable features to characterize the correlation among nucleotide positions in sequences. PCNV is also very easy to compute and can be used for rapid genome comparison. To test our novel method, we performed phylogenetic analysis with several viral and bacterial genome datasets with PCNV. For comparison, an alignment-based method, Bayesian inference, and two alignment-free methods, feature frequency profile and natural vector, were performed using the same datasets. We found that the PCNV technique is fast and accurate when used for phylogenetic analysis and classification of viruses and bacteria. |
format | Online Article Text |
id | pubmed-7312176 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-73121762020-06-26 Positional Correlation Natural Vector: A Novel Method for Genome Comparison He, Lily Dong, Rui He, Rong Lucy Yau, Stephen S.-T. Int J Mol Sci Article Advances in sequencing technology have made large amounts of biological data available. Evolutionary analysis of data such as DNA sequences is highly important in biological studies. As alignment methods are ineffective for analyzing large-scale data due to their inherently high costs, alignment-free methods have recently attracted attention in the field of bioinformatics. In this paper, we introduce a new positional correlation natural vector (PCNV) method that involves converting a DNA sequence into an 18-dimensional numerical feature vector. Using frequency and position correlation to represent the nucleotide distribution, it is possible to obtain a PCNV for a DNA sequence. This new numerical vector design uses six suitable features to characterize the correlation among nucleotide positions in sequences. PCNV is also very easy to compute and can be used for rapid genome comparison. To test our novel method, we performed phylogenetic analysis with several viral and bacterial genome datasets with PCNV. For comparison, an alignment-based method, Bayesian inference, and two alignment-free methods, feature frequency profile and natural vector, were performed using the same datasets. We found that the PCNV technique is fast and accurate when used for phylogenetic analysis and classification of viruses and bacteria. MDPI 2020-05-29 /pmc/articles/PMC7312176/ /pubmed/32485813 http://dx.doi.org/10.3390/ijms21113859 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article He, Lily Dong, Rui He, Rong Lucy Yau, Stephen S.-T. Positional Correlation Natural Vector: A Novel Method for Genome Comparison |
title | Positional Correlation Natural Vector: A Novel Method for Genome Comparison |
title_full | Positional Correlation Natural Vector: A Novel Method for Genome Comparison |
title_fullStr | Positional Correlation Natural Vector: A Novel Method for Genome Comparison |
title_full_unstemmed | Positional Correlation Natural Vector: A Novel Method for Genome Comparison |
title_short | Positional Correlation Natural Vector: A Novel Method for Genome Comparison |
title_sort | positional correlation natural vector: a novel method for genome comparison |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7312176/ https://www.ncbi.nlm.nih.gov/pubmed/32485813 http://dx.doi.org/10.3390/ijms21113859 |
work_keys_str_mv | AT helily positionalcorrelationnaturalvectoranovelmethodforgenomecomparison AT dongrui positionalcorrelationnaturalvectoranovelmethodforgenomecomparison AT heronglucy positionalcorrelationnaturalvectoranovelmethodforgenomecomparison AT yaustephenst positionalcorrelationnaturalvectoranovelmethodforgenomecomparison |