Cargando…

ReadDB Provides Efficient Storage for Mapped Short Reads

BACKGROUND: The advent of high-throughput sequencing has enabled sequencing based measurements of cellular function, with an individual measurement potentially consisting of more than 10(8 )reads. While tools are available for aligning sets of reads to genomes and interpreting the results, fewer too...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rolfe, P Alexander, Gifford, David K
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3143109/ https://www.ncbi.nlm.nih.gov/pubmed/21736741 http://dx.doi.org/10.1186/1471-2105-12-278

_version_	1782208883642597376
author	Rolfe, P Alexander Gifford, David K
author_facet	Rolfe, P Alexander Gifford, David K
author_sort	Rolfe, P Alexander
collection	PubMed
description	BACKGROUND: The advent of high-throughput sequencing has enabled sequencing based measurements of cellular function, with an individual measurement potentially consisting of more than 10(8 )reads. While tools are available for aligning sets of reads to genomes and interpreting the results, fewer tools have been developed to address the storage and retrieval requirements of large collections of aligned datasets. We present ReadDB, a network accessible column store database system for aligned high-throughput read datasets. RESULTS: ReadDB stores collections of aligned read positions and provides a client interface to support visualization and analysis. ReadDB is implemented as a network server that responds to queries on genomic intervals in an experiment with either the set of contained reads or a histogram based interval summary. Tests on datasets ranging from 10(5 )to 10(8 )reads demonstrate that ReadDB performance is generally within a factor of two of local-storage based methods and often three to five times better than other network-based methods. CONCLUSIONS: ReadDB is a high-performance foundation for ChIP-Seq and RNA-Seq analysis. The client-server model provides convenient access to compute cluster nodes or desktop visualization software without requiring a shared network filesystem or large amounts of local storage. The client code provides a simple interface for fast data access to visualization or analysis. ReadDB provides a new way to store genome-aligned reads for use in applications where read sequence and alignment mismatches are not needed.
format	Online Article Text
id	pubmed-3143109
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-31431092011-07-26 ReadDB Provides Efficient Storage for Mapped Short Reads Rolfe, P Alexander Gifford, David K BMC Bioinformatics Software BACKGROUND: The advent of high-throughput sequencing has enabled sequencing based measurements of cellular function, with an individual measurement potentially consisting of more than 10(8 )reads. While tools are available for aligning sets of reads to genomes and interpreting the results, fewer tools have been developed to address the storage and retrieval requirements of large collections of aligned datasets. We present ReadDB, a network accessible column store database system for aligned high-throughput read datasets. RESULTS: ReadDB stores collections of aligned read positions and provides a client interface to support visualization and analysis. ReadDB is implemented as a network server that responds to queries on genomic intervals in an experiment with either the set of contained reads or a histogram based interval summary. Tests on datasets ranging from 10(5 )to 10(8 )reads demonstrate that ReadDB performance is generally within a factor of two of local-storage based methods and often three to five times better than other network-based methods. CONCLUSIONS: ReadDB is a high-performance foundation for ChIP-Seq and RNA-Seq analysis. The client-server model provides convenient access to compute cluster nodes or desktop visualization software without requiring a shared network filesystem or large amounts of local storage. The client code provides a simple interface for fast data access to visualization or analysis. ReadDB provides a new way to store genome-aligned reads for use in applications where read sequence and alignment mismatches are not needed. BioMed Central 2011-07-07 /pmc/articles/PMC3143109/ /pubmed/21736741 http://dx.doi.org/10.1186/1471-2105-12-278 Text en Copyright © 2011 Rolfe and Gifford; licensee BioMed Central Ltd. https://creativecommons.org/licenses/by/2.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Rolfe, P Alexander Gifford, David K ReadDB Provides Efficient Storage for Mapped Short Reads
title	ReadDB Provides Efficient Storage for Mapped Short Reads
title_full	ReadDB Provides Efficient Storage for Mapped Short Reads
title_fullStr	ReadDB Provides Efficient Storage for Mapped Short Reads
title_full_unstemmed	ReadDB Provides Efficient Storage for Mapped Short Reads
title_short	ReadDB Provides Efficient Storage for Mapped Short Reads
title_sort	readdb provides efficient storage for mapped short reads
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3143109/ https://www.ncbi.nlm.nih.gov/pubmed/21736741 http://dx.doi.org/10.1186/1471-2105-12-278
work_keys_str_mv	AT rolfepalexander readdbprovidesefficientstorageformappedshortreads AT gifforddavidk readdbprovidesefficientstorageformappedshortreads

ReadDB Provides Efficient Storage for Mapped Short Reads

Ejemplares similares