Cargando…

Making sense of EST sequences by CLOBBing them

BACKGROUND: Expressed sequence tags (ESTs) are single pass reads from randomly selected cDNA clones. They provide a highly cost-effective method to access and identify expressed genes. However, they are often prone to sequencing errors and typically define incomplete transcripts. To increase the amo...

Descripción completa

Detalles Bibliográficos
Autores principales: Parkinson, John, Guiliano, David B, Blaxter, Mark
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2002
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC137596/
https://www.ncbi.nlm.nih.gov/pubmed/12398795
http://dx.doi.org/10.1186/1471-2105-3-31
_version_ 1782120455267680256
author Parkinson, John
Guiliano, David B
Blaxter, Mark
author_facet Parkinson, John
Guiliano, David B
Blaxter, Mark
author_sort Parkinson, John
collection PubMed
description BACKGROUND: Expressed sequence tags (ESTs) are single pass reads from randomly selected cDNA clones. They provide a highly cost-effective method to access and identify expressed genes. However, they are often prone to sequencing errors and typically define incomplete transcripts. To increase the amount of information obtainable from ESTs and reduce sequencing errors, it is necessary to cluster ESTs into groups sharing significant sequence similarity. RESULTS: As part of our ongoing EST programs investigating 'orphan' genomes, we have developed a clustering algorithm, CLOBB (Cluster on the basis of BLAST similarity) to identify and cluster ESTs. CLOBB may be used incrementally, preserving original cluster designations. It tracks cluster-specific events such as merging, identifies 'superclusters' of related clusters and avoids the expansion of chimeric clusters. Based on the Perl scripting language, CLOBB is highly portable relying only on a local installation of NCBI's freely available BLAST executable and can be usefully applied to > 95 % of the current EST datasets. Analysis of the Danio rerio EST dataset demonstrates that CLOBB compares favourably with two less portable systems, UniGene and TIGR Gene Indices. CONCLUSIONS: CLOBB provides a highly portable EST clustering solution and is freely downloaded from: http://www.nematodes.org/CLOBB
format Text
id pubmed-137596
institution National Center for Biotechnology Information
language English
publishDate 2002
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1375962002-12-08 Making sense of EST sequences by CLOBBing them Parkinson, John Guiliano, David B Blaxter, Mark BMC Bioinformatics Research article BACKGROUND: Expressed sequence tags (ESTs) are single pass reads from randomly selected cDNA clones. They provide a highly cost-effective method to access and identify expressed genes. However, they are often prone to sequencing errors and typically define incomplete transcripts. To increase the amount of information obtainable from ESTs and reduce sequencing errors, it is necessary to cluster ESTs into groups sharing significant sequence similarity. RESULTS: As part of our ongoing EST programs investigating 'orphan' genomes, we have developed a clustering algorithm, CLOBB (Cluster on the basis of BLAST similarity) to identify and cluster ESTs. CLOBB may be used incrementally, preserving original cluster designations. It tracks cluster-specific events such as merging, identifies 'superclusters' of related clusters and avoids the expansion of chimeric clusters. Based on the Perl scripting language, CLOBB is highly portable relying only on a local installation of NCBI's freely available BLAST executable and can be usefully applied to > 95 % of the current EST datasets. Analysis of the Danio rerio EST dataset demonstrates that CLOBB compares favourably with two less portable systems, UniGene and TIGR Gene Indices. CONCLUSIONS: CLOBB provides a highly portable EST clustering solution and is freely downloaded from: http://www.nematodes.org/CLOBB BioMed Central 2002-10-25 /pmc/articles/PMC137596/ /pubmed/12398795 http://dx.doi.org/10.1186/1471-2105-3-31 Text en Copyright ©2002 Parkinson et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Research article
Parkinson, John
Guiliano, David B
Blaxter, Mark
Making sense of EST sequences by CLOBBing them
title Making sense of EST sequences by CLOBBing them
title_full Making sense of EST sequences by CLOBBing them
title_fullStr Making sense of EST sequences by CLOBBing them
title_full_unstemmed Making sense of EST sequences by CLOBBing them
title_short Making sense of EST sequences by CLOBBing them
title_sort making sense of est sequences by clobbing them
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC137596/
https://www.ncbi.nlm.nih.gov/pubmed/12398795
http://dx.doi.org/10.1186/1471-2105-3-31
work_keys_str_mv AT parkinsonjohn makingsenseofestsequencesbyclobbingthem
AT guilianodavidb makingsenseofestsequencesbyclobbingthem
AT blaxtermark makingsenseofestsequencesbyclobbingthem