Cargando…
A Hybrid Distance Measure for Clustering Expressed Sequence Tags Originating from the Same Gene Family
BACKGROUND: Clustering is a key step in the processing of Expressed Sequence Tags (ESTs). The primary goal of clustering is to put ESTs from the same transcript of a single gene into a unique cluster. Recent EST clustering algorithms mostly adopt the alignment-free distance measures, where they tend...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3469558/ https://www.ncbi.nlm.nih.gov/pubmed/23071763 http://dx.doi.org/10.1371/journal.pone.0047216 |
_version_ | 1782246110677434368 |
---|---|
author | Ng, Keng-Hoong Ho, Chin-Kuan Phon-Amnuaisuk, Somnuk |
author_facet | Ng, Keng-Hoong Ho, Chin-Kuan Phon-Amnuaisuk, Somnuk |
author_sort | Ng, Keng-Hoong |
collection | PubMed |
description | BACKGROUND: Clustering is a key step in the processing of Expressed Sequence Tags (ESTs). The primary goal of clustering is to put ESTs from the same transcript of a single gene into a unique cluster. Recent EST clustering algorithms mostly adopt the alignment-free distance measures, where they tend to yield acceptable clustering accuracies with reasonable computational time. Despite the fact that these clustering methods work satisfactorily on a majority of the EST datasets, they have a common weakness. They are prone to deliver unsatisfactory clustering results when dealing with ESTs from the genes derived from the same family. The root cause is the distance measures applied on them are not sensitive enough to separate these closely related genes. METHODOLOGY/PRINCIPAL FINDINGS: We propose a hybrid distance measure that combines the global and local features extracted from ESTs, with the aim to address the clustering problem faced by ESTs derived from the same gene family. The clustering process is implemented using the DBSCAN algorithm. We test the hybrid distance measure on the ten EST datasets, and the clustering results are compared with the two alignment-free EST clustering tools, i.e. wcd and PEACE. The clustering results indicate that the proposed hybrid distance measure performs relatively better (in terms of clustering accuracy) than both EST clustering tools. CONCLUSIONS/SIGNIFICANCE: The clustering results provide support for the effectiveness of the proposed hybrid distance measure in solving the clustering problem for ESTs that originate from the same gene family. The improvement of clustering accuracies on the experimental datasets has supported the claim that the sensitivity of the hybrid distance measure is sufficient to solve the clustering problem. |
format | Online Article Text |
id | pubmed-3469558 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-34695582012-10-15 A Hybrid Distance Measure for Clustering Expressed Sequence Tags Originating from the Same Gene Family Ng, Keng-Hoong Ho, Chin-Kuan Phon-Amnuaisuk, Somnuk PLoS One Research Article BACKGROUND: Clustering is a key step in the processing of Expressed Sequence Tags (ESTs). The primary goal of clustering is to put ESTs from the same transcript of a single gene into a unique cluster. Recent EST clustering algorithms mostly adopt the alignment-free distance measures, where they tend to yield acceptable clustering accuracies with reasonable computational time. Despite the fact that these clustering methods work satisfactorily on a majority of the EST datasets, they have a common weakness. They are prone to deliver unsatisfactory clustering results when dealing with ESTs from the genes derived from the same family. The root cause is the distance measures applied on them are not sensitive enough to separate these closely related genes. METHODOLOGY/PRINCIPAL FINDINGS: We propose a hybrid distance measure that combines the global and local features extracted from ESTs, with the aim to address the clustering problem faced by ESTs derived from the same gene family. The clustering process is implemented using the DBSCAN algorithm. We test the hybrid distance measure on the ten EST datasets, and the clustering results are compared with the two alignment-free EST clustering tools, i.e. wcd and PEACE. The clustering results indicate that the proposed hybrid distance measure performs relatively better (in terms of clustering accuracy) than both EST clustering tools. CONCLUSIONS/SIGNIFICANCE: The clustering results provide support for the effectiveness of the proposed hybrid distance measure in solving the clustering problem for ESTs that originate from the same gene family. The improvement of clustering accuracies on the experimental datasets has supported the claim that the sensitivity of the hybrid distance measure is sufficient to solve the clustering problem. Public Library of Science 2012-10-11 /pmc/articles/PMC3469558/ /pubmed/23071763 http://dx.doi.org/10.1371/journal.pone.0047216 Text en © 2012 Ng et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Ng, Keng-Hoong Ho, Chin-Kuan Phon-Amnuaisuk, Somnuk A Hybrid Distance Measure for Clustering Expressed Sequence Tags Originating from the Same Gene Family |
title | A Hybrid Distance Measure for Clustering Expressed Sequence Tags Originating from the Same Gene Family |
title_full | A Hybrid Distance Measure for Clustering Expressed Sequence Tags Originating from the Same Gene Family |
title_fullStr | A Hybrid Distance Measure for Clustering Expressed Sequence Tags Originating from the Same Gene Family |
title_full_unstemmed | A Hybrid Distance Measure for Clustering Expressed Sequence Tags Originating from the Same Gene Family |
title_short | A Hybrid Distance Measure for Clustering Expressed Sequence Tags Originating from the Same Gene Family |
title_sort | hybrid distance measure for clustering expressed sequence tags originating from the same gene family |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3469558/ https://www.ncbi.nlm.nih.gov/pubmed/23071763 http://dx.doi.org/10.1371/journal.pone.0047216 |
work_keys_str_mv | AT ngkenghoong ahybriddistancemeasureforclusteringexpressedsequencetagsoriginatingfromthesamegenefamily AT hochinkuan ahybriddistancemeasureforclusteringexpressedsequencetagsoriginatingfromthesamegenefamily AT phonamnuaisuksomnuk ahybriddistancemeasureforclusteringexpressedsequencetagsoriginatingfromthesamegenefamily AT ngkenghoong hybriddistancemeasureforclusteringexpressedsequencetagsoriginatingfromthesamegenefamily AT hochinkuan hybriddistancemeasureforclusteringexpressedsequencetagsoriginatingfromthesamegenefamily AT phonamnuaisuksomnuk hybriddistancemeasureforclusteringexpressedsequencetagsoriginatingfromthesamegenefamily |