Cargando…

A novel hierarchical clustering algorithm for gene sequences

BACKGROUND: Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the...

Descripción completa

Detalles Bibliográficos
Autores principales: Wei, Dan, Jiang, Qingshan, Wei, Yanjie, Wang, Shengrui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3443659/
https://www.ncbi.nlm.nih.gov/pubmed/22823405
http://dx.doi.org/10.1186/1471-2105-13-174
_version_ 1782243590029705216
author Wei, Dan
Jiang, Qingshan
Wei, Yanjie
Wang, Shengrui
author_facet Wei, Dan
Jiang, Qingshan
Wei, Yanjie
Wang, Shengrui
author_sort Wei, Dan
collection PubMed
description BACKGROUND: Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the occurrence, location and order relation of k-tuples in DNA sequence. Afterwards, a hierarchical procedure is applied to clustering DNA sequences based on the feature vectors. RESULTS: The proposed distance measure and clustering method are evaluated by clustering functionally related genes and by phylogenetic analysis. This method is also compared with BlastClust, CD-HIT-EST and some others. The experimental results show our method is effective in classifying DNA sequences with similar biological characteristics and in discovering the underlying relationship among the sequences. CONCLUSIONS: We introduced a novel clustering algorithm which is based on a new sequence similarity measure. It is effective in classifying DNA sequences with similar biological characteristics and in discovering the relationship among the sequences.
format Online
Article
Text
id pubmed-3443659
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34436592012-09-18 A novel hierarchical clustering algorithm for gene sequences Wei, Dan Jiang, Qingshan Wei, Yanjie Wang, Shengrui BMC Bioinformatics Methodology Article BACKGROUND: Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the occurrence, location and order relation of k-tuples in DNA sequence. Afterwards, a hierarchical procedure is applied to clustering DNA sequences based on the feature vectors. RESULTS: The proposed distance measure and clustering method are evaluated by clustering functionally related genes and by phylogenetic analysis. This method is also compared with BlastClust, CD-HIT-EST and some others. The experimental results show our method is effective in classifying DNA sequences with similar biological characteristics and in discovering the underlying relationship among the sequences. CONCLUSIONS: We introduced a novel clustering algorithm which is based on a new sequence similarity measure. It is effective in classifying DNA sequences with similar biological characteristics and in discovering the relationship among the sequences. BioMed Central 2012-07-23 /pmc/articles/PMC3443659/ /pubmed/22823405 http://dx.doi.org/10.1186/1471-2105-13-174 Text en Copyright ©2012 Wei et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Wei, Dan
Jiang, Qingshan
Wei, Yanjie
Wang, Shengrui
A novel hierarchical clustering algorithm for gene sequences
title A novel hierarchical clustering algorithm for gene sequences
title_full A novel hierarchical clustering algorithm for gene sequences
title_fullStr A novel hierarchical clustering algorithm for gene sequences
title_full_unstemmed A novel hierarchical clustering algorithm for gene sequences
title_short A novel hierarchical clustering algorithm for gene sequences
title_sort novel hierarchical clustering algorithm for gene sequences
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3443659/
https://www.ncbi.nlm.nih.gov/pubmed/22823405
http://dx.doi.org/10.1186/1471-2105-13-174
work_keys_str_mv AT weidan anovelhierarchicalclusteringalgorithmforgenesequences
AT jiangqingshan anovelhierarchicalclusteringalgorithmforgenesequences
AT weiyanjie anovelhierarchicalclusteringalgorithmforgenesequences
AT wangshengrui anovelhierarchicalclusteringalgorithmforgenesequences
AT weidan novelhierarchicalclusteringalgorithmforgenesequences
AT jiangqingshan novelhierarchicalclusteringalgorithmforgenesequences
AT weiyanjie novelhierarchicalclusteringalgorithmforgenesequences
AT wangshengrui novelhierarchicalclusteringalgorithmforgenesequences