Cargando…

CAFTAN: a tool for fast mapping, and quality assessment of cDNAs

BACKGROUND: The German cDNA Consortium has been cloning full length cDNAs and continued with their exploitation in protein localization experiments and cellular assays. However, the efficient use of large cDNA resources requires the development of strategies that are capable of a speedy selection of...

Descripción completa

Detalles Bibliográficos
Autores principales: del Val, Coral, Kuryshev, Vladimir Yurjevich, Glatting, Karl-Heinz, Ernst, Peter, Hotz-Wagenblatt, Agnes, Poustka, Annemarie, Suhai, Sandor, Wiemann, Stefan
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1636072/
https://www.ncbi.nlm.nih.gov/pubmed/17064411
http://dx.doi.org/10.1186/1471-2105-7-473
_version_ 1782130730565894144
author del Val, Coral
Kuryshev, Vladimir Yurjevich
Glatting, Karl-Heinz
Ernst, Peter
Hotz-Wagenblatt, Agnes
Poustka, Annemarie
Suhai, Sandor
Wiemann, Stefan
author_facet del Val, Coral
Kuryshev, Vladimir Yurjevich
Glatting, Karl-Heinz
Ernst, Peter
Hotz-Wagenblatt, Agnes
Poustka, Annemarie
Suhai, Sandor
Wiemann, Stefan
author_sort del Val, Coral
collection PubMed
description BACKGROUND: The German cDNA Consortium has been cloning full length cDNAs and continued with their exploitation in protein localization experiments and cellular assays. However, the efficient use of large cDNA resources requires the development of strategies that are capable of a speedy selection of truly useful cDNAs from biological and experimental noise. To this end we have developed a new high-throughput analysis tool, CAFTAN, which simplifies these efforts and thus fills the gap between large-scale cDNA collections and their systematic annotation and application in functional genomics. RESULTS: CAFTAN is built around the mapping of cDNAs to the genome assembly, and the subsequent analysis of their genomic context. It uses sequence features like the presence and type of PolyA signals, inner and flanking repeats, the GC-content, splice site types, etc. All these features are evaluated in individual tests and classify cDNAs according to their sequence quality and likelihood to have been generated from fully processed mRNAs. Additionally, CAFTAN compares the coordinates of mapped cDNAs with the genomic coordinates of reference sets from public available resources (e.g., VEGA, ENSEMBL). This provides detailed information about overlapping exons and the structural classification of cDNAs with respect to the reference set of splice variants. The evaluation of CAFTAN showed that is able to correctly classify more than 85% of 5950 selected "known protein-coding" VEGA cDNAs as high quality multi- or single-exon. It identified as good 80.6 % of the single exon cDNAs and 85 % of the multiple exon cDNAs. The program is written in Perl and in a modular way, allowing the adoption of this strategy to other tasks like EST-annotation, or to extend it by adding new classification rules and new organism databases as they become available. We think that it is a very useful program for the annotation and research of unfinished genomes. CONCLUSION: CAFTAN is a high-throughput sequence analysis tool, which performs a fast and reliable quality prediction of cDNAs. Several thousands of cDNAs can be analyzed in a short time, giving the curator/scientist a first quick overview about the quality and the already existing annotation of a set of cDNAs. It supports the rejection of low quality cDNAs and helps in the selection of likely novel splice variants, and/or completely novel transcripts for new experiments.
format Text
id pubmed-1636072
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-16360722006-11-15 CAFTAN: a tool for fast mapping, and quality assessment of cDNAs del Val, Coral Kuryshev, Vladimir Yurjevich Glatting, Karl-Heinz Ernst, Peter Hotz-Wagenblatt, Agnes Poustka, Annemarie Suhai, Sandor Wiemann, Stefan BMC Bioinformatics Software BACKGROUND: The German cDNA Consortium has been cloning full length cDNAs and continued with their exploitation in protein localization experiments and cellular assays. However, the efficient use of large cDNA resources requires the development of strategies that are capable of a speedy selection of truly useful cDNAs from biological and experimental noise. To this end we have developed a new high-throughput analysis tool, CAFTAN, which simplifies these efforts and thus fills the gap between large-scale cDNA collections and their systematic annotation and application in functional genomics. RESULTS: CAFTAN is built around the mapping of cDNAs to the genome assembly, and the subsequent analysis of their genomic context. It uses sequence features like the presence and type of PolyA signals, inner and flanking repeats, the GC-content, splice site types, etc. All these features are evaluated in individual tests and classify cDNAs according to their sequence quality and likelihood to have been generated from fully processed mRNAs. Additionally, CAFTAN compares the coordinates of mapped cDNAs with the genomic coordinates of reference sets from public available resources (e.g., VEGA, ENSEMBL). This provides detailed information about overlapping exons and the structural classification of cDNAs with respect to the reference set of splice variants. The evaluation of CAFTAN showed that is able to correctly classify more than 85% of 5950 selected "known protein-coding" VEGA cDNAs as high quality multi- or single-exon. It identified as good 80.6 % of the single exon cDNAs and 85 % of the multiple exon cDNAs. The program is written in Perl and in a modular way, allowing the adoption of this strategy to other tasks like EST-annotation, or to extend it by adding new classification rules and new organism databases as they become available. We think that it is a very useful program for the annotation and research of unfinished genomes. CONCLUSION: CAFTAN is a high-throughput sequence analysis tool, which performs a fast and reliable quality prediction of cDNAs. Several thousands of cDNAs can be analyzed in a short time, giving the curator/scientist a first quick overview about the quality and the already existing annotation of a set of cDNAs. It supports the rejection of low quality cDNAs and helps in the selection of likely novel splice variants, and/or completely novel transcripts for new experiments. BioMed Central 2006-10-25 /pmc/articles/PMC1636072/ /pubmed/17064411 http://dx.doi.org/10.1186/1471-2105-7-473 Text en Copyright © 2006 del Val et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
del Val, Coral
Kuryshev, Vladimir Yurjevich
Glatting, Karl-Heinz
Ernst, Peter
Hotz-Wagenblatt, Agnes
Poustka, Annemarie
Suhai, Sandor
Wiemann, Stefan
CAFTAN: a tool for fast mapping, and quality assessment of cDNAs
title CAFTAN: a tool for fast mapping, and quality assessment of cDNAs
title_full CAFTAN: a tool for fast mapping, and quality assessment of cDNAs
title_fullStr CAFTAN: a tool for fast mapping, and quality assessment of cDNAs
title_full_unstemmed CAFTAN: a tool for fast mapping, and quality assessment of cDNAs
title_short CAFTAN: a tool for fast mapping, and quality assessment of cDNAs
title_sort caftan: a tool for fast mapping, and quality assessment of cdnas
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1636072/
https://www.ncbi.nlm.nih.gov/pubmed/17064411
http://dx.doi.org/10.1186/1471-2105-7-473
work_keys_str_mv AT delvalcoral caftanatoolforfastmappingandqualityassessmentofcdnas
AT kuryshevvladimiryurjevich caftanatoolforfastmappingandqualityassessmentofcdnas
AT glattingkarlheinz caftanatoolforfastmappingandqualityassessmentofcdnas
AT ernstpeter caftanatoolforfastmappingandqualityassessmentofcdnas
AT hotzwagenblattagnes caftanatoolforfastmappingandqualityassessmentofcdnas
AT poustkaannemarie caftanatoolforfastmappingandqualityassessmentofcdnas
AT suhaisandor caftanatoolforfastmappingandqualityassessmentofcdnas
AT wiemannstefan caftanatoolforfastmappingandqualityassessmentofcdnas