Cargando…
Making sense of EST sequences by CLOBBing them
BACKGROUND: Expressed sequence tags (ESTs) are single pass reads from randomly selected cDNA clones. They provide a highly cost-effective method to access and identify expressed genes. However, they are often prone to sequencing errors and typically define incomplete transcripts. To increase the amo...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2002
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC137596/ https://www.ncbi.nlm.nih.gov/pubmed/12398795 http://dx.doi.org/10.1186/1471-2105-3-31 |
_version_ | 1782120455267680256 |
---|---|
author | Parkinson, John Guiliano, David B Blaxter, Mark |
author_facet | Parkinson, John Guiliano, David B Blaxter, Mark |
author_sort | Parkinson, John |
collection | PubMed |
description | BACKGROUND: Expressed sequence tags (ESTs) are single pass reads from randomly selected cDNA clones. They provide a highly cost-effective method to access and identify expressed genes. However, they are often prone to sequencing errors and typically define incomplete transcripts. To increase the amount of information obtainable from ESTs and reduce sequencing errors, it is necessary to cluster ESTs into groups sharing significant sequence similarity. RESULTS: As part of our ongoing EST programs investigating 'orphan' genomes, we have developed a clustering algorithm, CLOBB (Cluster on the basis of BLAST similarity) to identify and cluster ESTs. CLOBB may be used incrementally, preserving original cluster designations. It tracks cluster-specific events such as merging, identifies 'superclusters' of related clusters and avoids the expansion of chimeric clusters. Based on the Perl scripting language, CLOBB is highly portable relying only on a local installation of NCBI's freely available BLAST executable and can be usefully applied to > 95 % of the current EST datasets. Analysis of the Danio rerio EST dataset demonstrates that CLOBB compares favourably with two less portable systems, UniGene and TIGR Gene Indices. CONCLUSIONS: CLOBB provides a highly portable EST clustering solution and is freely downloaded from: http://www.nematodes.org/CLOBB |
format | Text |
id | pubmed-137596 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2002 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-1375962002-12-08 Making sense of EST sequences by CLOBBing them Parkinson, John Guiliano, David B Blaxter, Mark BMC Bioinformatics Research article BACKGROUND: Expressed sequence tags (ESTs) are single pass reads from randomly selected cDNA clones. They provide a highly cost-effective method to access and identify expressed genes. However, they are often prone to sequencing errors and typically define incomplete transcripts. To increase the amount of information obtainable from ESTs and reduce sequencing errors, it is necessary to cluster ESTs into groups sharing significant sequence similarity. RESULTS: As part of our ongoing EST programs investigating 'orphan' genomes, we have developed a clustering algorithm, CLOBB (Cluster on the basis of BLAST similarity) to identify and cluster ESTs. CLOBB may be used incrementally, preserving original cluster designations. It tracks cluster-specific events such as merging, identifies 'superclusters' of related clusters and avoids the expansion of chimeric clusters. Based on the Perl scripting language, CLOBB is highly portable relying only on a local installation of NCBI's freely available BLAST executable and can be usefully applied to > 95 % of the current EST datasets. Analysis of the Danio rerio EST dataset demonstrates that CLOBB compares favourably with two less portable systems, UniGene and TIGR Gene Indices. CONCLUSIONS: CLOBB provides a highly portable EST clustering solution and is freely downloaded from: http://www.nematodes.org/CLOBB BioMed Central 2002-10-25 /pmc/articles/PMC137596/ /pubmed/12398795 http://dx.doi.org/10.1186/1471-2105-3-31 Text en Copyright ©2002 Parkinson et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. |
spellingShingle | Research article Parkinson, John Guiliano, David B Blaxter, Mark Making sense of EST sequences by CLOBBing them |
title | Making sense of EST sequences by CLOBBing them |
title_full | Making sense of EST sequences by CLOBBing them |
title_fullStr | Making sense of EST sequences by CLOBBing them |
title_full_unstemmed | Making sense of EST sequences by CLOBBing them |
title_short | Making sense of EST sequences by CLOBBing them |
title_sort | making sense of est sequences by clobbing them |
topic | Research article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC137596/ https://www.ncbi.nlm.nih.gov/pubmed/12398795 http://dx.doi.org/10.1186/1471-2105-3-31 |
work_keys_str_mv | AT parkinsonjohn makingsenseofestsequencesbyclobbingthem AT guilianodavidb makingsenseofestsequencesbyclobbingthem AT blaxtermark makingsenseofestsequencesbyclobbingthem |