Cargando…

Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector

Genome comparison is a vital research area of bioinformatics. For large-scale genome comparisons, the Multiple Sequence Alignment (MSA) methods have been impractical to use due to its algorithmic complexity. In this study, we propose a novel alignment-free method based on the one-to-one corresponden...

Descripción completa

Detalles Bibliográficos
Autores principales: Pei, Shaojun, Dong, Rui, He, Rong Lucy, Yau, Stephen S.-T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6661692/
https://www.ncbi.nlm.nih.gov/pubmed/31384399
http://dx.doi.org/10.1016/j.csbj.2019.07.003
_version_ 1783439503783886848
author Pei, Shaojun
Dong, Rui
He, Rong Lucy
Yau, Stephen S.-T.
author_facet Pei, Shaojun
Dong, Rui
He, Rong Lucy
Yau, Stephen S.-T.
author_sort Pei, Shaojun
collection PubMed
description Genome comparison is a vital research area of bioinformatics. For large-scale genome comparisons, the Multiple Sequence Alignment (MSA) methods have been impractical to use due to its algorithmic complexity. In this study, we propose a novel alignment-free method based on the one-to-one correspondence between a DNA sequence and its complete central moment vector of the cumulative Fourier power and phase spectra. In addition, the covariance between the four nucleotides in the power and phase spectra is included. We use the cumulative Fourier power and phase spectra to define a 28-dimensional vector for each DNA sequence. Euclidean distances between the vectors can measure the dissimilarity between DNA sequences. We perform testing with datasets of different sizes and types including simulated DNA sequences, exon-intron and complete genomes. The results show that our method is more accurate and efficient for performing hierarchical clustering than other alignment-free methods and MSA methods.
format Online
Article
Text
id pubmed-6661692
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-66616922019-08-05 Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector Pei, Shaojun Dong, Rui He, Rong Lucy Yau, Stephen S.-T. Comput Struct Biotechnol J Research Article Genome comparison is a vital research area of bioinformatics. For large-scale genome comparisons, the Multiple Sequence Alignment (MSA) methods have been impractical to use due to its algorithmic complexity. In this study, we propose a novel alignment-free method based on the one-to-one correspondence between a DNA sequence and its complete central moment vector of the cumulative Fourier power and phase spectra. In addition, the covariance between the four nucleotides in the power and phase spectra is included. We use the cumulative Fourier power and phase spectra to define a 28-dimensional vector for each DNA sequence. Euclidean distances between the vectors can measure the dissimilarity between DNA sequences. We perform testing with datasets of different sizes and types including simulated DNA sequences, exon-intron and complete genomes. The results show that our method is more accurate and efficient for performing hierarchical clustering than other alignment-free methods and MSA methods. Research Network of Computational and Structural Biotechnology 2019-07-11 /pmc/articles/PMC6661692/ /pubmed/31384399 http://dx.doi.org/10.1016/j.csbj.2019.07.003 Text en © 2019 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Pei, Shaojun
Dong, Rui
He, Rong Lucy
Yau, Stephen S.-T.
Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector
title Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector
title_full Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector
title_fullStr Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector
title_full_unstemmed Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector
title_short Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector
title_sort large-scale genome comparison based on cumulative fourier power and phase spectra: central moment and covariance vector
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6661692/
https://www.ncbi.nlm.nih.gov/pubmed/31384399
http://dx.doi.org/10.1016/j.csbj.2019.07.003
work_keys_str_mv AT peishaojun largescalegenomecomparisonbasedoncumulativefourierpowerandphasespectracentralmomentandcovariancevector
AT dongrui largescalegenomecomparisonbasedoncumulativefourierpowerandphasespectracentralmomentandcovariancevector
AT heronglucy largescalegenomecomparisonbasedoncumulativefourierpowerandphasespectracentralmomentandcovariancevector
AT yaustephenst largescalegenomecomparisonbasedoncumulativefourierpowerandphasespectracentralmomentandcovariancevector