Cargando…

SMITH: a LIMS for handling next-generation sequencing workflows

BACKGROUND: Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally perf...

Descripción completa

Detalles Bibliográficos
Autores principales: Venco, Francesco, Vaskin, Yuriy, Ceol, Arnaud, Muller, Heiko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255740/
https://www.ncbi.nlm.nih.gov/pubmed/25471934
http://dx.doi.org/10.1186/1471-2105-15-S14-S3
_version_ 1782347480644452352
author Venco, Francesco
Vaskin, Yuriy
Ceol, Arnaud
Muller, Heiko
author_facet Venco, Francesco
Vaskin, Yuriy
Ceol, Arnaud
Muller, Heiko
author_sort Venco, Francesco
collection PubMed
description BACKGROUND: Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). METHODS: SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. RESULTS: SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. CONCLUSIONS: SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis.
format Online
Article
Text
id pubmed-4255740
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42557402014-12-05 SMITH: a LIMS for handling next-generation sequencing workflows Venco, Francesco Vaskin, Yuriy Ceol, Arnaud Muller, Heiko BMC Bioinformatics Research BACKGROUND: Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). METHODS: SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. RESULTS: SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. CONCLUSIONS: SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis. BioMed Central 2014-11-27 /pmc/articles/PMC4255740/ /pubmed/25471934 http://dx.doi.org/10.1186/1471-2105-15-S14-S3 Text en Copyright © 2014 Venco et al.; licensee BioMed Central. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Venco, Francesco
Vaskin, Yuriy
Ceol, Arnaud
Muller, Heiko
SMITH: a LIMS for handling next-generation sequencing workflows
title SMITH: a LIMS for handling next-generation sequencing workflows
title_full SMITH: a LIMS for handling next-generation sequencing workflows
title_fullStr SMITH: a LIMS for handling next-generation sequencing workflows
title_full_unstemmed SMITH: a LIMS for handling next-generation sequencing workflows
title_short SMITH: a LIMS for handling next-generation sequencing workflows
title_sort smith: a lims for handling next-generation sequencing workflows
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255740/
https://www.ncbi.nlm.nih.gov/pubmed/25471934
http://dx.doi.org/10.1186/1471-2105-15-S14-S3
work_keys_str_mv AT vencofrancesco smithalimsforhandlingnextgenerationsequencingworkflows
AT vaskinyuriy smithalimsforhandlingnextgenerationsequencingworkflows
AT ceolarnaud smithalimsforhandlingnextgenerationsequencingworkflows
AT mullerheiko smithalimsforhandlingnextgenerationsequencingworkflows