Cargando…
A novel clustering method via nucleotide-based Fourier power spectrum analysis
A novel clustering method is proposed to classify genes or genomes. This method uses a natural representation of genomic data by binary indicator sequences of each nucleotide (adenine (A), cytosine (C), guanine (G), and thymine (T)). Afterwards, the discrete Fourier transform is applied to these ind...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier Ltd.
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7094093/ https://www.ncbi.nlm.nih.gov/pubmed/21443881 http://dx.doi.org/10.1016/j.jtbi.2011.03.029 |
_version_ | 1783510398569283584 |
---|---|
author | Zhao, Bo Duan, Victor Yau, Stephen S.-T. |
author_facet | Zhao, Bo Duan, Victor Yau, Stephen S.-T. |
author_sort | Zhao, Bo |
collection | PubMed |
description | A novel clustering method is proposed to classify genes or genomes. This method uses a natural representation of genomic data by binary indicator sequences of each nucleotide (adenine (A), cytosine (C), guanine (G), and thymine (T)). Afterwards, the discrete Fourier transform is applied to these indicator sequences to calculate spectra of the nucleotides. Mathematical moments are calculated for each of these spectra to create a multidimensional vector in a Euclidean space for each gene or genome sequence. Thus, each gene or genome sequence is realized as a geometric point in the Euclidean space. Finally, pairwise Euclidean distances between these points (i.e. genome sequences) are calculated to cluster the gene or genome sequences. This method is applied to three sets of data. The first is 34 strains of coronavirus genomic data, the second is 118 of the known strains of Human rhinovirus (HRV), and the third is 30 bacteria genomes. The distance matrices are computed based on the three sets, showing the distances from each point to the others. We used the complete linkage clustering algorithm to build phylogenetic trees to indicate how the distances among these sequence correspond to the evolutionary relationship among these sequences. This genome representation provides a powerful and efficient method to classify genomes and is much faster than the widely acknowledged multiple sequence alignment method. |
format | Online Article Text |
id | pubmed-7094093 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Elsevier Ltd. |
record_format | MEDLINE/PubMed |
spelling | pubmed-70940932020-03-25 A novel clustering method via nucleotide-based Fourier power spectrum analysis Zhao, Bo Duan, Victor Yau, Stephen S.-T. J Theor Biol Article A novel clustering method is proposed to classify genes or genomes. This method uses a natural representation of genomic data by binary indicator sequences of each nucleotide (adenine (A), cytosine (C), guanine (G), and thymine (T)). Afterwards, the discrete Fourier transform is applied to these indicator sequences to calculate spectra of the nucleotides. Mathematical moments are calculated for each of these spectra to create a multidimensional vector in a Euclidean space for each gene or genome sequence. Thus, each gene or genome sequence is realized as a geometric point in the Euclidean space. Finally, pairwise Euclidean distances between these points (i.e. genome sequences) are calculated to cluster the gene or genome sequences. This method is applied to three sets of data. The first is 34 strains of coronavirus genomic data, the second is 118 of the known strains of Human rhinovirus (HRV), and the third is 30 bacteria genomes. The distance matrices are computed based on the three sets, showing the distances from each point to the others. We used the complete linkage clustering algorithm to build phylogenetic trees to indicate how the distances among these sequence correspond to the evolutionary relationship among these sequences. This genome representation provides a powerful and efficient method to classify genomes and is much faster than the widely acknowledged multiple sequence alignment method. Elsevier Ltd. 2011-06-21 2011-03-26 /pmc/articles/PMC7094093/ /pubmed/21443881 http://dx.doi.org/10.1016/j.jtbi.2011.03.029 Text en Copyright © 2011 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Article Zhao, Bo Duan, Victor Yau, Stephen S.-T. A novel clustering method via nucleotide-based Fourier power spectrum analysis |
title | A novel clustering method via nucleotide-based Fourier power spectrum analysis |
title_full | A novel clustering method via nucleotide-based Fourier power spectrum analysis |
title_fullStr | A novel clustering method via nucleotide-based Fourier power spectrum analysis |
title_full_unstemmed | A novel clustering method via nucleotide-based Fourier power spectrum analysis |
title_short | A novel clustering method via nucleotide-based Fourier power spectrum analysis |
title_sort | novel clustering method via nucleotide-based fourier power spectrum analysis |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7094093/ https://www.ncbi.nlm.nih.gov/pubmed/21443881 http://dx.doi.org/10.1016/j.jtbi.2011.03.029 |
work_keys_str_mv | AT zhaobo anovelclusteringmethodvianucleotidebasedfourierpowerspectrumanalysis AT duanvictor anovelclusteringmethodvianucleotidebasedfourierpowerspectrumanalysis AT yaustephenst anovelclusteringmethodvianucleotidebasedfourierpowerspectrumanalysis AT zhaobo novelclusteringmethodvianucleotidebasedfourierpowerspectrumanalysis AT duanvictor novelclusteringmethodvianucleotidebasedfourierpowerspectrumanalysis AT yaustephenst novelclusteringmethodvianucleotidebasedfourierpowerspectrumanalysis |