Cargando…

A novel clustering method via nucleotide-based Fourier power spectrum analysis

A novel clustering method is proposed to classify genes or genomes. This method uses a natural representation of genomic data by binary indicator sequences of each nucleotide (adenine (A), cytosine (C), guanine (G), and thymine (T)). Afterwards, the discrete Fourier transform is applied to these ind...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Bo, Duan, Victor, Yau, Stephen S.-T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Ltd. 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7094093/
https://www.ncbi.nlm.nih.gov/pubmed/21443881
http://dx.doi.org/10.1016/j.jtbi.2011.03.029
_version_ 1783510398569283584
author Zhao, Bo
Duan, Victor
Yau, Stephen S.-T.
author_facet Zhao, Bo
Duan, Victor
Yau, Stephen S.-T.
author_sort Zhao, Bo
collection PubMed
description A novel clustering method is proposed to classify genes or genomes. This method uses a natural representation of genomic data by binary indicator sequences of each nucleotide (adenine (A), cytosine (C), guanine (G), and thymine (T)). Afterwards, the discrete Fourier transform is applied to these indicator sequences to calculate spectra of the nucleotides. Mathematical moments are calculated for each of these spectra to create a multidimensional vector in a Euclidean space for each gene or genome sequence. Thus, each gene or genome sequence is realized as a geometric point in the Euclidean space. Finally, pairwise Euclidean distances between these points (i.e. genome sequences) are calculated to cluster the gene or genome sequences. This method is applied to three sets of data. The first is 34 strains of coronavirus genomic data, the second is 118 of the known strains of Human rhinovirus (HRV), and the third is 30 bacteria genomes. The distance matrices are computed based on the three sets, showing the distances from each point to the others. We used the complete linkage clustering algorithm to build phylogenetic trees to indicate how the distances among these sequence correspond to the evolutionary relationship among these sequences. This genome representation provides a powerful and efficient method to classify genomes and is much faster than the widely acknowledged multiple sequence alignment method.
format Online
Article
Text
id pubmed-7094093
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Elsevier Ltd.
record_format MEDLINE/PubMed
spelling pubmed-70940932020-03-25 A novel clustering method via nucleotide-based Fourier power spectrum analysis Zhao, Bo Duan, Victor Yau, Stephen S.-T. J Theor Biol Article A novel clustering method is proposed to classify genes or genomes. This method uses a natural representation of genomic data by binary indicator sequences of each nucleotide (adenine (A), cytosine (C), guanine (G), and thymine (T)). Afterwards, the discrete Fourier transform is applied to these indicator sequences to calculate spectra of the nucleotides. Mathematical moments are calculated for each of these spectra to create a multidimensional vector in a Euclidean space for each gene or genome sequence. Thus, each gene or genome sequence is realized as a geometric point in the Euclidean space. Finally, pairwise Euclidean distances between these points (i.e. genome sequences) are calculated to cluster the gene or genome sequences. This method is applied to three sets of data. The first is 34 strains of coronavirus genomic data, the second is 118 of the known strains of Human rhinovirus (HRV), and the third is 30 bacteria genomes. The distance matrices are computed based on the three sets, showing the distances from each point to the others. We used the complete linkage clustering algorithm to build phylogenetic trees to indicate how the distances among these sequence correspond to the evolutionary relationship among these sequences. This genome representation provides a powerful and efficient method to classify genomes and is much faster than the widely acknowledged multiple sequence alignment method. Elsevier Ltd. 2011-06-21 2011-03-26 /pmc/articles/PMC7094093/ /pubmed/21443881 http://dx.doi.org/10.1016/j.jtbi.2011.03.029 Text en Copyright © 2011 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Zhao, Bo
Duan, Victor
Yau, Stephen S.-T.
A novel clustering method via nucleotide-based Fourier power spectrum analysis
title A novel clustering method via nucleotide-based Fourier power spectrum analysis
title_full A novel clustering method via nucleotide-based Fourier power spectrum analysis
title_fullStr A novel clustering method via nucleotide-based Fourier power spectrum analysis
title_full_unstemmed A novel clustering method via nucleotide-based Fourier power spectrum analysis
title_short A novel clustering method via nucleotide-based Fourier power spectrum analysis
title_sort novel clustering method via nucleotide-based fourier power spectrum analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7094093/
https://www.ncbi.nlm.nih.gov/pubmed/21443881
http://dx.doi.org/10.1016/j.jtbi.2011.03.029
work_keys_str_mv AT zhaobo anovelclusteringmethodvianucleotidebasedfourierpowerspectrumanalysis
AT duanvictor anovelclusteringmethodvianucleotidebasedfourierpowerspectrumanalysis
AT yaustephenst anovelclusteringmethodvianucleotidebasedfourierpowerspectrumanalysis
AT zhaobo novelclusteringmethodvianucleotidebasedfourierpowerspectrumanalysis
AT duanvictor novelclusteringmethodvianucleotidebasedfourierpowerspectrumanalysis
AT yaustephenst novelclusteringmethodvianucleotidebasedfourierpowerspectrumanalysis