Cargando…

An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets

BACKGROUND: The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hosseini, Parsa, Tremblay, Arianne, Matthews, Benjamin F, Alkharouf, Nadim W
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2908109/ https://www.ncbi.nlm.nih.gov/pubmed/20598141 http://dx.doi.org/10.1186/1756-0500-3-183

_version_	1782184157747609600
author	Hosseini, Parsa Tremblay, Arianne Matthews, Benjamin F Alkharouf, Nadim W
author_facet	Hosseini, Parsa Tremblay, Arianne Matthews, Benjamin F Alkharouf, Nadim W
author_sort	Hosseini, Parsa
collection	PubMed
description	BACKGROUND: The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value. FINDINGS: We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations. CONCLUSIONS: TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data in a CASAVA-build into functional annotations while producing corresponding gene expression measurements. Achieving such analysis is executed in an ultrafast and highly efficient manner, whether the analysis be a single-read or paired-end sequencing experiment. TASE is a user-friendly and freely available application, allowing rapid analysis and annotation of any given Illumina Solexa sequencing dataset with ease.
format	Text
id	pubmed-2908109
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-29081092010-07-22 An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets Hosseini, Parsa Tremblay, Arianne Matthews, Benjamin F Alkharouf, Nadim W BMC Res Notes Technical Note BACKGROUND: The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value. FINDINGS: We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations. CONCLUSIONS: TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data in a CASAVA-build into functional annotations while producing corresponding gene expression measurements. Achieving such analysis is executed in an ultrafast and highly efficient manner, whether the analysis be a single-read or paired-end sequencing experiment. TASE is a user-friendly and freely available application, allowing rapid analysis and annotation of any given Illumina Solexa sequencing dataset with ease. BioMed Central 2010-07-02 /pmc/articles/PMC2908109/ /pubmed/20598141 http://dx.doi.org/10.1186/1756-0500-3-183 Text en Copyright ©2010 Alkharouf et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Technical Note Hosseini, Parsa Tremblay, Arianne Matthews, Benjamin F Alkharouf, Nadim W An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets
title	An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets
title_full	An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets
title_fullStr	An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets
title_full_unstemmed	An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets
title_short	An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets
title_sort	efficient annotation and gene-expression derivation tool for illumina solexa datasets
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2908109/ https://www.ncbi.nlm.nih.gov/pubmed/20598141 http://dx.doi.org/10.1186/1756-0500-3-183
work_keys_str_mv	AT hosseiniparsa anefficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets AT tremblayarianne anefficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets AT matthewsbenjaminf anefficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets AT alkharoufnadimw anefficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets AT hosseiniparsa efficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets AT tremblayarianne efficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets AT matthewsbenjaminf efficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets AT alkharoufnadimw efficientannotationandgeneexpressionderivationtoolforilluminasolexadatasets

An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets

Ejemplares similares