Cargando…

Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute

BACKGROUND: Increasingly large amounts of DNA sequencing data are being generated within the Wellcome Trust Sanger Institute (WTSI). The traditional file system struggles to handle these increasing amounts of sequence data. A good data management system therefore needs to be implemented and integrat...

Descripción completa

Detalles Bibliográficos
Autores principales: Chiang, Gen-Tao, Clapham, Peter, Qi, Guoying, Sale, Kevin, Coates, Guy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3228552/
https://www.ncbi.nlm.nih.gov/pubmed/21906284
http://dx.doi.org/10.1186/1471-2105-12-361
_version_ 1782217832118878208
author Chiang, Gen-Tao
Clapham, Peter
Qi, Guoying
Sale, Kevin
Coates, Guy
author_facet Chiang, Gen-Tao
Clapham, Peter
Qi, Guoying
Sale, Kevin
Coates, Guy
author_sort Chiang, Gen-Tao
collection PubMed
description BACKGROUND: Increasingly large amounts of DNA sequencing data are being generated within the Wellcome Trust Sanger Institute (WTSI). The traditional file system struggles to handle these increasing amounts of sequence data. A good data management system therefore needs to be implemented and integrated into the current WTSI infrastructure. Such a system enables good management of the IT infrastructure of the sequencing pipeline and allows biologists to track their data. RESULTS: We have chosen a data grid system, iRODS (Rule-Oriented Data management systems), to act as the data management system for the WTSI. iRODS provides a rule-based system management approach which makes data replication much easier and provides extra data protection. Unlike the metadata provided by traditional file systems, the metadata system of iRODS is comprehensive and allows users to customize their own application level metadata. Users and IT experts in the WTSI can then query the metadata to find and track data. The aim of this paper is to describe how we designed and used (from both system and user viewpoints) iRODS as a data management system. Details are given about the problems faced and the solutions found when iRODS was implemented. A simple use case describing how users within the WTSI use iRODS is also introduced. CONCLUSIONS: iRODS has been implemented and works as the production system for the sequencing pipeline of the WTSI. Both biologists and IT experts can now track and manage data, which could not previously be achieved. This novel approach allows biologists to define their own metadata and query the genomic data using those metadata.
format Online
Article
Text
id pubmed-3228552
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32285522011-12-02 Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute Chiang, Gen-Tao Clapham, Peter Qi, Guoying Sale, Kevin Coates, Guy BMC Bioinformatics Correspondence BACKGROUND: Increasingly large amounts of DNA sequencing data are being generated within the Wellcome Trust Sanger Institute (WTSI). The traditional file system struggles to handle these increasing amounts of sequence data. A good data management system therefore needs to be implemented and integrated into the current WTSI infrastructure. Such a system enables good management of the IT infrastructure of the sequencing pipeline and allows biologists to track their data. RESULTS: We have chosen a data grid system, iRODS (Rule-Oriented Data management systems), to act as the data management system for the WTSI. iRODS provides a rule-based system management approach which makes data replication much easier and provides extra data protection. Unlike the metadata provided by traditional file systems, the metadata system of iRODS is comprehensive and allows users to customize their own application level metadata. Users and IT experts in the WTSI can then query the metadata to find and track data. The aim of this paper is to describe how we designed and used (from both system and user viewpoints) iRODS as a data management system. Details are given about the problems faced and the solutions found when iRODS was implemented. A simple use case describing how users within the WTSI use iRODS is also introduced. CONCLUSIONS: iRODS has been implemented and works as the production system for the sequencing pipeline of the WTSI. Both biologists and IT experts can now track and manage data, which could not previously be achieved. This novel approach allows biologists to define their own metadata and query the genomic data using those metadata. BioMed Central 2011-09-09 /pmc/articles/PMC3228552/ /pubmed/21906284 http://dx.doi.org/10.1186/1471-2105-12-361 Text en Copyright ©2011 Chiang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Correspondence
Chiang, Gen-Tao
Clapham, Peter
Qi, Guoying
Sale, Kevin
Coates, Guy
Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute
title Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute
title_full Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute
title_fullStr Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute
title_full_unstemmed Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute
title_short Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute
title_sort implementing a genomic data management system using irods in the wellcome trust sanger institute
topic Correspondence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3228552/
https://www.ncbi.nlm.nih.gov/pubmed/21906284
http://dx.doi.org/10.1186/1471-2105-12-361
work_keys_str_mv AT chianggentao implementingagenomicdatamanagementsystemusingirodsinthewellcometrustsangerinstitute
AT claphampeter implementingagenomicdatamanagementsystemusingirodsinthewellcometrustsangerinstitute
AT qiguoying implementingagenomicdatamanagementsystemusingirodsinthewellcometrustsangerinstitute
AT salekevin implementingagenomicdatamanagementsystemusingirodsinthewellcometrustsangerinstitute
AT coatesguy implementingagenomicdatamanagementsystemusingirodsinthewellcometrustsangerinstitute