Cargando…
A novel hierarchical clustering algorithm for gene sequences
BACKGROUND: Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3443659/ https://www.ncbi.nlm.nih.gov/pubmed/22823405 http://dx.doi.org/10.1186/1471-2105-13-174 |
_version_ | 1782243590029705216 |
---|---|
author | Wei, Dan Jiang, Qingshan Wei, Yanjie Wang, Shengrui |
author_facet | Wei, Dan Jiang, Qingshan Wei, Yanjie Wang, Shengrui |
author_sort | Wei, Dan |
collection | PubMed |
description | BACKGROUND: Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the occurrence, location and order relation of k-tuples in DNA sequence. Afterwards, a hierarchical procedure is applied to clustering DNA sequences based on the feature vectors. RESULTS: The proposed distance measure and clustering method are evaluated by clustering functionally related genes and by phylogenetic analysis. This method is also compared with BlastClust, CD-HIT-EST and some others. The experimental results show our method is effective in classifying DNA sequences with similar biological characteristics and in discovering the underlying relationship among the sequences. CONCLUSIONS: We introduced a novel clustering algorithm which is based on a new sequence similarity measure. It is effective in classifying DNA sequences with similar biological characteristics and in discovering the relationship among the sequences. |
format | Online Article Text |
id | pubmed-3443659 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-34436592012-09-18 A novel hierarchical clustering algorithm for gene sequences Wei, Dan Jiang, Qingshan Wei, Yanjie Wang, Shengrui BMC Bioinformatics Methodology Article BACKGROUND: Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the occurrence, location and order relation of k-tuples in DNA sequence. Afterwards, a hierarchical procedure is applied to clustering DNA sequences based on the feature vectors. RESULTS: The proposed distance measure and clustering method are evaluated by clustering functionally related genes and by phylogenetic analysis. This method is also compared with BlastClust, CD-HIT-EST and some others. The experimental results show our method is effective in classifying DNA sequences with similar biological characteristics and in discovering the underlying relationship among the sequences. CONCLUSIONS: We introduced a novel clustering algorithm which is based on a new sequence similarity measure. It is effective in classifying DNA sequences with similar biological characteristics and in discovering the relationship among the sequences. BioMed Central 2012-07-23 /pmc/articles/PMC3443659/ /pubmed/22823405 http://dx.doi.org/10.1186/1471-2105-13-174 Text en Copyright ©2012 Wei et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Wei, Dan Jiang, Qingshan Wei, Yanjie Wang, Shengrui A novel hierarchical clustering algorithm for gene sequences |
title | A novel hierarchical clustering algorithm for gene sequences |
title_full | A novel hierarchical clustering algorithm for gene sequences |
title_fullStr | A novel hierarchical clustering algorithm for gene sequences |
title_full_unstemmed | A novel hierarchical clustering algorithm for gene sequences |
title_short | A novel hierarchical clustering algorithm for gene sequences |
title_sort | novel hierarchical clustering algorithm for gene sequences |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3443659/ https://www.ncbi.nlm.nih.gov/pubmed/22823405 http://dx.doi.org/10.1186/1471-2105-13-174 |
work_keys_str_mv | AT weidan anovelhierarchicalclusteringalgorithmforgenesequences AT jiangqingshan anovelhierarchicalclusteringalgorithmforgenesequences AT weiyanjie anovelhierarchicalclusteringalgorithmforgenesequences AT wangshengrui anovelhierarchicalclusteringalgorithmforgenesequences AT weidan novelhierarchicalclusteringalgorithmforgenesequences AT jiangqingshan novelhierarchicalclusteringalgorithmforgenesequences AT weiyanjie novelhierarchicalclusteringalgorithmforgenesequences AT wangshengrui novelhierarchicalclusteringalgorithmforgenesequences |