Cargando…

StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics

Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand...

Descripción completa

Detalles Bibliográficos
Autores principales: Ramirez-Gonzalez, Ricardo H., Leggett, Richard M., Waite, Darren, Thanki, Anil, Drou, Nizar, Caccamo, Mario, Davey, Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000Research 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3938176/
https://www.ncbi.nlm.nih.gov/pubmed/24627795
http://dx.doi.org/10.12688/f1000research.2-248.v2
_version_ 1782305574246940672
author Ramirez-Gonzalez, Ricardo H.
Leggett, Richard M.
Waite, Darren
Thanki, Anil
Drou, Nizar
Caccamo, Mario
Davey, Robert
author_facet Ramirez-Gonzalez, Ricardo H.
Leggett, Richard M.
Waite, Darren
Thanki, Anil
Drou, Nizar
Caccamo, Mario
Davey, Robert
author_sort Ramirez-Gonzalez, Ricardo H.
collection PubMed
description Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of individual run performance to collaborators or customers. We present StatsDB, an open-source software package for storage and analysis of next generation sequencing run metrics. The system has been designed for incorporation into a primary analysis pipeline, either at the programmatic level or via integration into existing user interfaces. Statistics are stored in an SQL database and APIs provide the ability to store and access the data while abstracting the underlying database design. This abstraction allows simpler, wider querying across multiple fields than is possible by the manual steps and calculation required to dissect individual reports, e.g. ”provide metrics about nucleotide bias in libraries using adaptor barcode X, across all runs on sequencer A, within the last month”. The software is supplied with modules for storage of statistics from FastQC, a commonly used tool for analysis of sequence reads, but the open nature of the database schema means it can be easily adapted to other tools. Currently at The Genome Analysis Centre (TGAC), reports are accessed through our LIMS system or through a standalone GUI tool, but the API and supplied examples make it easy to develop custom reports and to interface with other packages.
format Online
Article
Text
id pubmed-3938176
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher F1000Research
record_format MEDLINE/PubMed
spelling pubmed-39381762014-03-12 StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics Ramirez-Gonzalez, Ricardo H. Leggett, Richard M. Waite, Darren Thanki, Anil Drou, Nizar Caccamo, Mario Davey, Robert F1000Res Web Tool Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of individual run performance to collaborators or customers. We present StatsDB, an open-source software package for storage and analysis of next generation sequencing run metrics. The system has been designed for incorporation into a primary analysis pipeline, either at the programmatic level or via integration into existing user interfaces. Statistics are stored in an SQL database and APIs provide the ability to store and access the data while abstracting the underlying database design. This abstraction allows simpler, wider querying across multiple fields than is possible by the manual steps and calculation required to dissect individual reports, e.g. ”provide metrics about nucleotide bias in libraries using adaptor barcode X, across all runs on sequencer A, within the last month”. The software is supplied with modules for storage of statistics from FastQC, a commonly used tool for analysis of sequence reads, but the open nature of the database schema means it can be easily adapted to other tools. Currently at The Genome Analysis Centre (TGAC), reports are accessed through our LIMS system or through a standalone GUI tool, but the API and supplied examples make it easy to develop custom reports and to interface with other packages. F1000Research 2014-02-19 /pmc/articles/PMC3938176/ /pubmed/24627795 http://dx.doi.org/10.12688/f1000research.2-248.v2 Text en Copyright: © 2014 Ramirez-Gonzalez RH et al. http://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. http://creativecommons.org/licenses/by/3.0/ Data associated with the article are available under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original data is properly cited.
spellingShingle Web Tool
Ramirez-Gonzalez, Ricardo H.
Leggett, Richard M.
Waite, Darren
Thanki, Anil
Drou, Nizar
Caccamo, Mario
Davey, Robert
StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics
title StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics
title_full StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics
title_fullStr StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics
title_full_unstemmed StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics
title_short StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics
title_sort statsdb: platform-agnostic storage and understanding of next generation sequencing run metrics
topic Web Tool
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3938176/
https://www.ncbi.nlm.nih.gov/pubmed/24627795
http://dx.doi.org/10.12688/f1000research.2-248.v2
work_keys_str_mv AT ramirezgonzalezricardoh statsdbplatformagnosticstorageandunderstandingofnextgenerationsequencingrunmetrics
AT leggettrichardm statsdbplatformagnosticstorageandunderstandingofnextgenerationsequencingrunmetrics
AT waitedarren statsdbplatformagnosticstorageandunderstandingofnextgenerationsequencingrunmetrics
AT thankianil statsdbplatformagnosticstorageandunderstandingofnextgenerationsequencingrunmetrics
AT drounizar statsdbplatformagnosticstorageandunderstandingofnextgenerationsequencingrunmetrics
AT caccamomario statsdbplatformagnosticstorageandunderstandingofnextgenerationsequencingrunmetrics
AT daveyrobert statsdbplatformagnosticstorageandunderstandingofnextgenerationsequencingrunmetrics