Cargando…

Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions

Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzin...

Descripción completa

Detalles Bibliográficos
Autores principales: Bianchi, Valerio, Ceol, Arnaud, Ogier, Alessandro G. E., de Pretis, Stefano, Galeota, Eugenia, Kishore, Kamal, Bora, Pranami, Croci, Ottavio, Campaner, Stefano, Amati, Bruno, Morelli, Marco J., Pelizzola, Mattia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4858535/
https://www.ncbi.nlm.nih.gov/pubmed/27200084
http://dx.doi.org/10.3389/fgene.2016.00075
_version_ 1782430817600929792
author Bianchi, Valerio
Ceol, Arnaud
Ogier, Alessandro G. E.
de Pretis, Stefano
Galeota, Eugenia
Kishore, Kamal
Bora, Pranami
Croci, Ottavio
Campaner, Stefano
Amati, Bruno
Morelli, Marco J.
Pelizzola, Mattia
author_facet Bianchi, Valerio
Ceol, Arnaud
Ogier, Alessandro G. E.
de Pretis, Stefano
Galeota, Eugenia
Kishore, Kamal
Bora, Pranami
Croci, Ottavio
Campaner, Stefano
Amati, Bruno
Morelli, Marco J.
Pelizzola, Mattia
author_sort Bianchi, Valerio
collection PubMed
description Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non-experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HTS-flow, a new workflow management system conceived to address the concerns we raised. HTS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts.
format Online
Article
Text
id pubmed-4858535
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-48585352016-05-19 Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions Bianchi, Valerio Ceol, Arnaud Ogier, Alessandro G. E. de Pretis, Stefano Galeota, Eugenia Kishore, Kamal Bora, Pranami Croci, Ottavio Campaner, Stefano Amati, Bruno Morelli, Marco J. Pelizzola, Mattia Front Genet Genetics Next-generation sequencing (NGS) technologies have deeply changed our understanding of cellular processes by delivering an astonishing amount of data at affordable prices; nowadays, many biology laboratories have already accumulated a large number of sequenced samples. However, managing and analyzing these data poses new challenges, which may easily be underestimated by research groups devoid of IT and quantitative skills. In this perspective, we identify five issues that should be carefully addressed by research groups approaching NGS technologies. In particular, the five key issues to be considered concern: (1) adopting a laboratory management system (LIMS) and safeguard the resulting raw data structure in downstream analyses; (2) monitoring the flow of the data and standardizing input and output directories and file names, even when multiple analysis protocols are used on the same data; (3) ensuring complete traceability of the analysis performed; (4) enabling non-experienced users to run analyses through a graphical user interface (GUI) acting as a front-end for the pipelines; (5) relying on standard metadata to annotate the datasets, and when possible using controlled vocabularies, ideally derived from biomedical ontologies. Finally, we discuss the currently available tools in the light of these issues, and we introduce HTS-flow, a new workflow management system conceived to address the concerns we raised. HTS-flow is able to retrieve information from a LIMS database, manages data analyses through a simple GUI, outputs data in standard locations and allows the complete traceability of datasets, accompanying metadata and analysis scripts. Frontiers Media S.A. 2016-05-06 /pmc/articles/PMC4858535/ /pubmed/27200084 http://dx.doi.org/10.3389/fgene.2016.00075 Text en Copyright © 2016 Bianchi, Ceol, Ogier, de Pretis, Galeota, Kishore, Bora, Croci, Campaner, Amati, Morelli and Pelizzola. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Bianchi, Valerio
Ceol, Arnaud
Ogier, Alessandro G. E.
de Pretis, Stefano
Galeota, Eugenia
Kishore, Kamal
Bora, Pranami
Croci, Ottavio
Campaner, Stefano
Amati, Bruno
Morelli, Marco J.
Pelizzola, Mattia
Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions
title Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions
title_full Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions
title_fullStr Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions
title_full_unstemmed Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions
title_short Integrated Systems for NGS Data Management and Analysis: Open Issues and Available Solutions
title_sort integrated systems for ngs data management and analysis: open issues and available solutions
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4858535/
https://www.ncbi.nlm.nih.gov/pubmed/27200084
http://dx.doi.org/10.3389/fgene.2016.00075
work_keys_str_mv AT bianchivalerio integratedsystemsforngsdatamanagementandanalysisopenissuesandavailablesolutions
AT ceolarnaud integratedsystemsforngsdatamanagementandanalysisopenissuesandavailablesolutions
AT ogieralessandroge integratedsystemsforngsdatamanagementandanalysisopenissuesandavailablesolutions
AT depretisstefano integratedsystemsforngsdatamanagementandanalysisopenissuesandavailablesolutions
AT galeotaeugenia integratedsystemsforngsdatamanagementandanalysisopenissuesandavailablesolutions
AT kishorekamal integratedsystemsforngsdatamanagementandanalysisopenissuesandavailablesolutions
AT borapranami integratedsystemsforngsdatamanagementandanalysisopenissuesandavailablesolutions
AT crociottavio integratedsystemsforngsdatamanagementandanalysisopenissuesandavailablesolutions
AT campanerstefano integratedsystemsforngsdatamanagementandanalysisopenissuesandavailablesolutions
AT amatibruno integratedsystemsforngsdatamanagementandanalysisopenissuesandavailablesolutions
AT morellimarcoj integratedsystemsforngsdatamanagementandanalysisopenissuesandavailablesolutions
AT pelizzolamattia integratedsystemsforngsdatamanagementandanalysisopenissuesandavailablesolutions