Cargando…

Geoseq: a tool for dissecting deep-sequencing datasets

BACKGROUND: Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used muc...

Descripción completa

Detalles Bibliográficos
Autores principales: Gurtowski, James, Cancio, Anthony, Shah, Hardik, Levovitz, Chaya, George, Ajish, Homann, Robert, Sachidanandam, Ravi
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2972303/
https://www.ncbi.nlm.nih.gov/pubmed/20939882
http://dx.doi.org/10.1186/1471-2105-11-506
_version_ 1782190785675919360
author Gurtowski, James
Cancio, Anthony
Shah, Hardik
Levovitz, Chaya
George, Ajish
Homann, Robert
Sachidanandam, Ravi
author_facet Gurtowski, James
Cancio, Anthony
Shah, Hardik
Levovitz, Chaya
George, Ajish
Homann, Robert
Sachidanandam, Ravi
author_sort Gurtowski, James
collection PubMed
description BACKGROUND: Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. RESULTS: Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. CONCLUSIONS: Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.
format Text
id pubmed-2972303
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29723032010-11-05 Geoseq: a tool for dissecting deep-sequencing datasets Gurtowski, James Cancio, Anthony Shah, Hardik Levovitz, Chaya George, Ajish Homann, Robert Sachidanandam, Ravi BMC Bioinformatics Software BACKGROUND: Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. RESULTS: Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. CONCLUSIONS: Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool. BioMed Central 2010-10-12 /pmc/articles/PMC2972303/ /pubmed/20939882 http://dx.doi.org/10.1186/1471-2105-11-506 Text en Copyright ©2010 Gurtowski et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Gurtowski, James
Cancio, Anthony
Shah, Hardik
Levovitz, Chaya
George, Ajish
Homann, Robert
Sachidanandam, Ravi
Geoseq: a tool for dissecting deep-sequencing datasets
title Geoseq: a tool for dissecting deep-sequencing datasets
title_full Geoseq: a tool for dissecting deep-sequencing datasets
title_fullStr Geoseq: a tool for dissecting deep-sequencing datasets
title_full_unstemmed Geoseq: a tool for dissecting deep-sequencing datasets
title_short Geoseq: a tool for dissecting deep-sequencing datasets
title_sort geoseq: a tool for dissecting deep-sequencing datasets
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2972303/
https://www.ncbi.nlm.nih.gov/pubmed/20939882
http://dx.doi.org/10.1186/1471-2105-11-506
work_keys_str_mv AT gurtowskijames geoseqatoolfordissectingdeepsequencingdatasets
AT cancioanthony geoseqatoolfordissectingdeepsequencingdatasets
AT shahhardik geoseqatoolfordissectingdeepsequencingdatasets
AT levovitzchaya geoseqatoolfordissectingdeepsequencingdatasets
AT georgeajish geoseqatoolfordissectingdeepsequencingdatasets
AT homannrobert geoseqatoolfordissectingdeepsequencingdatasets
AT sachidanandamravi geoseqatoolfordissectingdeepsequencingdatasets