Cargando…

iRODS metadata management for a cancer genome analysis workflow

BACKGROUND: The massive amounts of data from next generation sequencing (NGS) methods pose various challenges with respect to data security, storage and metadata management. While there is a broad range of data analysis pipelines, these challenges remain largely unaddressed to date. RESULTS: We desc...

Descripción completa

Detalles Bibliográficos
Autores principales: Nieroda, Lech, Maas, Lukas, Thiebes, Scott, Lang, Ulrich, Sunyaev, Ali, Achter, Viktor, Peifer, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6334444/
https://www.ncbi.nlm.nih.gov/pubmed/30646845
http://dx.doi.org/10.1186/s12859-018-2576-5
Descripción
Sumario:BACKGROUND: The massive amounts of data from next generation sequencing (NGS) methods pose various challenges with respect to data security, storage and metadata management. While there is a broad range of data analysis pipelines, these challenges remain largely unaddressed to date. RESULTS: We describe the integration of the open-source metadata management system iRODS (Integrated Rule-Oriented Data System) with a cancer genome analysis pipeline in a high performance computing environment. The system allows for customized metadata attributes as well as fine-grained protection rules and is augmented by a user-friendly front-end for metadata input. This results in a robust, efficient end-to-end workflow under consideration of data security, central storage and unified metadata information. CONCLUSIONS: Integrating iRODS with an NGS data analysis pipeline is a suitable method for addressing the challenges of data security, storage and metadata management in NGS environments.