Cargando…

SeqWare Query Engine: storing and searching sequence data in the cloud

BACKGROUND: Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome re...

Descripción completa

Detalles Bibliográficos
Autores principales:	O’Connor, Brian D, Merriman, Barry, Nelson, Stanley F
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3040528/ https://www.ncbi.nlm.nih.gov/pubmed/21210981 http://dx.doi.org/10.1186/1471-2105-11-S12-S2

_version_	1782198330217988096
author	O’Connor, Brian D Merriman, Barry Nelson, Stanley F
author_facet	O’Connor, Brian D Merriman, Barry Nelson, Stanley F
author_sort	O’Connor, Brian D
collection	PubMed
description	BACKGROUND: Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. RESULTS: In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). CONCLUSIONS: The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.
format	Text
id	pubmed-3040528
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30405282011-02-18 SeqWare Query Engine: storing and searching sequence data in the cloud O’Connor, Brian D Merriman, Barry Nelson, Stanley F BMC Bioinformatics Proceedings BACKGROUND: Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. RESULTS: In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). CONCLUSIONS: The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. BioMed Central 2010-12-21 /pmc/articles/PMC3040528/ /pubmed/21210981 http://dx.doi.org/10.1186/1471-2105-11-S12-S2 Text en Copyright ©2010 O'Connor et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings O’Connor, Brian D Merriman, Barry Nelson, Stanley F SeqWare Query Engine: storing and searching sequence data in the cloud
title	SeqWare Query Engine: storing and searching sequence data in the cloud
title_full	SeqWare Query Engine: storing and searching sequence data in the cloud
title_fullStr	SeqWare Query Engine: storing and searching sequence data in the cloud
title_full_unstemmed	SeqWare Query Engine: storing and searching sequence data in the cloud
title_short	SeqWare Query Engine: storing and searching sequence data in the cloud
title_sort	seqware query engine: storing and searching sequence data in the cloud
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3040528/ https://www.ncbi.nlm.nih.gov/pubmed/21210981 http://dx.doi.org/10.1186/1471-2105-11-S12-S2
work_keys_str_mv	AT oconnorbriand seqwarequeryenginestoringandsearchingsequencedatainthecloud AT merrimanbarry seqwarequeryenginestoringandsearchingsequencedatainthecloud AT nelsonstanleyf seqwarequeryenginestoringandsearchingsequencedatainthecloud

SeqWare Query Engine: storing and searching sequence data in the cloud

Ejemplares similares