Cargando…

CLU: A new algorithm for EST clustering

BACKGROUND: The continuous flow of EST data remains one of the richest sources for discoveries in modern biology. The first step in EST data mining is usually associated with EST clustering, the process of grouping of original fragments according to their annotation, similarity to known genomic DNA...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ptitsyn, Andrey, Hide, Winston
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2005
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1637039/ https://www.ncbi.nlm.nih.gov/pubmed/16026600 http://dx.doi.org/10.1186/1471-2105-6-S2-S3

_version_	1782130782656004096
author	Ptitsyn, Andrey Hide, Winston
author_facet	Ptitsyn, Andrey Hide, Winston
author_sort	Ptitsyn, Andrey
collection	PubMed
description	BACKGROUND: The continuous flow of EST data remains one of the richest sources for discoveries in modern biology. The first step in EST data mining is usually associated with EST clustering, the process of grouping of original fragments according to their annotation, similarity to known genomic DNA or each other. Clustered EST data, accumulated in databases such as UniGene, STACK and TIGR Gene Indices have proven to be crucial in research areas from gene discovery to regulation of gene expression. RESULTS: We have developed a new nucleotide sequence matching algorithm and its implementation for clustering EST sequences. The program is based on the original CLU match detection algorithm, which has improved performance over the widely used d2_cluster. The CLU algorithm automatically ignores low-complexity regions like poly-tracts and short tandem repeats. CONCLUSION: CLU represents a new generation of EST clustering algorithm with improved performance over current approaches. An early implementation can be applied in small and medium-size projects. The CLU program is available on an open source basis free of charge. It can be downloaded from
format	Text
id	pubmed-1637039
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-16370392006-11-16 CLU: A new algorithm for EST clustering Ptitsyn, Andrey Hide, Winston BMC Bioinformatics Proceedings BACKGROUND: The continuous flow of EST data remains one of the richest sources for discoveries in modern biology. The first step in EST data mining is usually associated with EST clustering, the process of grouping of original fragments according to their annotation, similarity to known genomic DNA or each other. Clustered EST data, accumulated in databases such as UniGene, STACK and TIGR Gene Indices have proven to be crucial in research areas from gene discovery to regulation of gene expression. RESULTS: We have developed a new nucleotide sequence matching algorithm and its implementation for clustering EST sequences. The program is based on the original CLU match detection algorithm, which has improved performance over the widely used d2_cluster. The CLU algorithm automatically ignores low-complexity regions like poly-tracts and short tandem repeats. CONCLUSION: CLU represents a new generation of EST clustering algorithm with improved performance over current approaches. An early implementation can be applied in small and medium-size projects. The CLU program is available on an open source basis free of charge. It can be downloaded from BioMed Central 2005-07-15 /pmc/articles/PMC1637039/ /pubmed/16026600 http://dx.doi.org/10.1186/1471-2105-6-S2-S3 Text en Copyright © 2006 Ptitsyn and Hide; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Ptitsyn, Andrey Hide, Winston CLU: A new algorithm for EST clustering
title	CLU: A new algorithm for EST clustering
title_full	CLU: A new algorithm for EST clustering
title_fullStr	CLU: A new algorithm for EST clustering
title_full_unstemmed	CLU: A new algorithm for EST clustering
title_short	CLU: A new algorithm for EST clustering
title_sort	clu: a new algorithm for est clustering
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1637039/ https://www.ncbi.nlm.nih.gov/pubmed/16026600 http://dx.doi.org/10.1186/1471-2105-6-S2-S3
work_keys_str_mv	AT ptitsynandrey cluanewalgorithmforestclustering AT hidewinston cluanewalgorithmforestclustering

CLU: A new algorithm for EST clustering

Ejemplares similares