Cargando…

iRODS metadata management for a cancer genome analysis workflow

BACKGROUND: The massive amounts of data from next generation sequencing (NGS) methods pose various challenges with respect to data security, storage and metadata management. While there is a broad range of data analysis pipelines, these challenges remain largely unaddressed to date. RESULTS: We desc...

Descripción completa

Detalles Bibliográficos
Autores principales: Nieroda, Lech, Maas, Lukas, Thiebes, Scott, Lang, Ulrich, Sunyaev, Ali, Achter, Viktor, Peifer, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6334444/
https://www.ncbi.nlm.nih.gov/pubmed/30646845
http://dx.doi.org/10.1186/s12859-018-2576-5
_version_ 1783387717688623104
author Nieroda, Lech
Maas, Lukas
Thiebes, Scott
Lang, Ulrich
Sunyaev, Ali
Achter, Viktor
Peifer, Martin
author_facet Nieroda, Lech
Maas, Lukas
Thiebes, Scott
Lang, Ulrich
Sunyaev, Ali
Achter, Viktor
Peifer, Martin
author_sort Nieroda, Lech
collection PubMed
description BACKGROUND: The massive amounts of data from next generation sequencing (NGS) methods pose various challenges with respect to data security, storage and metadata management. While there is a broad range of data analysis pipelines, these challenges remain largely unaddressed to date. RESULTS: We describe the integration of the open-source metadata management system iRODS (Integrated Rule-Oriented Data System) with a cancer genome analysis pipeline in a high performance computing environment. The system allows for customized metadata attributes as well as fine-grained protection rules and is augmented by a user-friendly front-end for metadata input. This results in a robust, efficient end-to-end workflow under consideration of data security, central storage and unified metadata information. CONCLUSIONS: Integrating iRODS with an NGS data analysis pipeline is a suitable method for addressing the challenges of data security, storage and metadata management in NGS environments.
format Online
Article
Text
id pubmed-6334444
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63344442019-01-23 iRODS metadata management for a cancer genome analysis workflow Nieroda, Lech Maas, Lukas Thiebes, Scott Lang, Ulrich Sunyaev, Ali Achter, Viktor Peifer, Martin BMC Bioinformatics Methodology Article BACKGROUND: The massive amounts of data from next generation sequencing (NGS) methods pose various challenges with respect to data security, storage and metadata management. While there is a broad range of data analysis pipelines, these challenges remain largely unaddressed to date. RESULTS: We describe the integration of the open-source metadata management system iRODS (Integrated Rule-Oriented Data System) with a cancer genome analysis pipeline in a high performance computing environment. The system allows for customized metadata attributes as well as fine-grained protection rules and is augmented by a user-friendly front-end for metadata input. This results in a robust, efficient end-to-end workflow under consideration of data security, central storage and unified metadata information. CONCLUSIONS: Integrating iRODS with an NGS data analysis pipeline is a suitable method for addressing the challenges of data security, storage and metadata management in NGS environments. BioMed Central 2019-01-15 /pmc/articles/PMC6334444/ /pubmed/30646845 http://dx.doi.org/10.1186/s12859-018-2576-5 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Nieroda, Lech
Maas, Lukas
Thiebes, Scott
Lang, Ulrich
Sunyaev, Ali
Achter, Viktor
Peifer, Martin
iRODS metadata management for a cancer genome analysis workflow
title iRODS metadata management for a cancer genome analysis workflow
title_full iRODS metadata management for a cancer genome analysis workflow
title_fullStr iRODS metadata management for a cancer genome analysis workflow
title_full_unstemmed iRODS metadata management for a cancer genome analysis workflow
title_short iRODS metadata management for a cancer genome analysis workflow
title_sort irods metadata management for a cancer genome analysis workflow
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6334444/
https://www.ncbi.nlm.nih.gov/pubmed/30646845
http://dx.doi.org/10.1186/s12859-018-2576-5
work_keys_str_mv AT nierodalech irodsmetadatamanagementforacancergenomeanalysisworkflow
AT maaslukas irodsmetadatamanagementforacancergenomeanalysisworkflow
AT thiebesscott irodsmetadatamanagementforacancergenomeanalysisworkflow
AT langulrich irodsmetadatamanagementforacancergenomeanalysisworkflow
AT sunyaevali irodsmetadatamanagementforacancergenomeanalysisworkflow
AT achterviktor irodsmetadatamanagementforacancergenomeanalysisworkflow
AT peifermartin irodsmetadatamanagementforacancergenomeanalysisworkflow