Cargando…
Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector
Genome comparison is a vital research area of bioinformatics. For large-scale genome comparisons, the Multiple Sequence Alignment (MSA) methods have been impractical to use due to its algorithmic complexity. In this study, we propose a novel alignment-free method based on the one-to-one corresponden...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6661692/ https://www.ncbi.nlm.nih.gov/pubmed/31384399 http://dx.doi.org/10.1016/j.csbj.2019.07.003 |
_version_ | 1783439503783886848 |
---|---|
author | Pei, Shaojun Dong, Rui He, Rong Lucy Yau, Stephen S.-T. |
author_facet | Pei, Shaojun Dong, Rui He, Rong Lucy Yau, Stephen S.-T. |
author_sort | Pei, Shaojun |
collection | PubMed |
description | Genome comparison is a vital research area of bioinformatics. For large-scale genome comparisons, the Multiple Sequence Alignment (MSA) methods have been impractical to use due to its algorithmic complexity. In this study, we propose a novel alignment-free method based on the one-to-one correspondence between a DNA sequence and its complete central moment vector of the cumulative Fourier power and phase spectra. In addition, the covariance between the four nucleotides in the power and phase spectra is included. We use the cumulative Fourier power and phase spectra to define a 28-dimensional vector for each DNA sequence. Euclidean distances between the vectors can measure the dissimilarity between DNA sequences. We perform testing with datasets of different sizes and types including simulated DNA sequences, exon-intron and complete genomes. The results show that our method is more accurate and efficient for performing hierarchical clustering than other alignment-free methods and MSA methods. |
format | Online Article Text |
id | pubmed-6661692 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-66616922019-08-05 Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector Pei, Shaojun Dong, Rui He, Rong Lucy Yau, Stephen S.-T. Comput Struct Biotechnol J Research Article Genome comparison is a vital research area of bioinformatics. For large-scale genome comparisons, the Multiple Sequence Alignment (MSA) methods have been impractical to use due to its algorithmic complexity. In this study, we propose a novel alignment-free method based on the one-to-one correspondence between a DNA sequence and its complete central moment vector of the cumulative Fourier power and phase spectra. In addition, the covariance between the four nucleotides in the power and phase spectra is included. We use the cumulative Fourier power and phase spectra to define a 28-dimensional vector for each DNA sequence. Euclidean distances between the vectors can measure the dissimilarity between DNA sequences. We perform testing with datasets of different sizes and types including simulated DNA sequences, exon-intron and complete genomes. The results show that our method is more accurate and efficient for performing hierarchical clustering than other alignment-free methods and MSA methods. Research Network of Computational and Structural Biotechnology 2019-07-11 /pmc/articles/PMC6661692/ /pubmed/31384399 http://dx.doi.org/10.1016/j.csbj.2019.07.003 Text en © 2019 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article Pei, Shaojun Dong, Rui He, Rong Lucy Yau, Stephen S.-T. Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector |
title | Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector |
title_full | Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector |
title_fullStr | Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector |
title_full_unstemmed | Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector |
title_short | Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector |
title_sort | large-scale genome comparison based on cumulative fourier power and phase spectra: central moment and covariance vector |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6661692/ https://www.ncbi.nlm.nih.gov/pubmed/31384399 http://dx.doi.org/10.1016/j.csbj.2019.07.003 |
work_keys_str_mv | AT peishaojun largescalegenomecomparisonbasedoncumulativefourierpowerandphasespectracentralmomentandcovariancevector AT dongrui largescalegenomecomparisonbasedoncumulativefourierpowerandphasespectracentralmomentandcovariancevector AT heronglucy largescalegenomecomparisonbasedoncumulativefourierpowerandphasespectracentralmomentandcovariancevector AT yaustephenst largescalegenomecomparisonbasedoncumulativefourierpowerandphasespectracentralmomentandcovariancevector |