Cargando…

A digital repository with an extensible data model for biobanking and genomic analysis management

MOTIVATION: Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standa...

Descripción completa

Detalles Bibliográficos
Autores principales: Izzo, Massimiliano, Mortola, Francesco, Arnulfo, Gabriele, Fato, Marco M, Varesio, Luigi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083403/
https://www.ncbi.nlm.nih.gov/pubmed/25077808
http://dx.doi.org/10.1186/1471-2164-15-S3-S3
_version_ 1782324373137391616
author Izzo, Massimiliano
Mortola, Francesco
Arnulfo, Gabriele
Fato, Marco M
Varesio, Luigi
author_facet Izzo, Massimiliano
Mortola, Francesco
Arnulfo, Gabriele
Fato, Marco M
Varesio, Luigi
author_sort Izzo, Massimiliano
collection PubMed
description MOTIVATION: Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standardization is not feasible and it becomes crucial to develop digital repositories with flexible and extensible data models, as in the case of modern integrated biobanks management. RESULTS: We developed a novel data model in JSON format to describe heterogeneous data in a generic biomedical science scenario. The model is built on two hierarchical entities: processes and events, roughly corresponding to research studies and analysis steps within a single study. A number of sequential events can be grouped in a process building up a hierarchical structure to track patient and sample history. Each event can produce new data. Data is described by a set of user-defined metadata, and may have one or more associated files. We integrated the model in a web based digital repository with a data grid storage to manage large data sets located in geographically distinct areas. We built a graphical interface that allows authorized users to define new data types dynamically, according to their requirements. Operators compose queries on metadata fields using a flexible search interface and run them on the database and on the grid. We applied the digital repository to the integrated management of samples, patients and medical history in the BIT-Gaslini biobank. The platform currently manages 1800 samples of over 900 patients. Microarray data from 150 analyses are stored on the grid storage and replicated on two physical resources for preservation. The system is equipped with data integration capabilities with other biobanks for worldwide information sharing. CONCLUSIONS: Our data model enables users to continuously define flexible, ad hoc, and loosely structured metadata, for information sharing in specific research projects and purposes. This approach can improve sensitively interdisciplinary research collaboration and allows to track patients' clinical records, sample management information, and genomic data. The web interface allows the operators to easily manage, query, and annotate the files, without dealing with the technicalities of the data grid.
format Online
Article
Text
id pubmed-4083403
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40834032014-07-18 A digital repository with an extensible data model for biobanking and genomic analysis management Izzo, Massimiliano Mortola, Francesco Arnulfo, Gabriele Fato, Marco M Varesio, Luigi BMC Genomics Research MOTIVATION: Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standardization is not feasible and it becomes crucial to develop digital repositories with flexible and extensible data models, as in the case of modern integrated biobanks management. RESULTS: We developed a novel data model in JSON format to describe heterogeneous data in a generic biomedical science scenario. The model is built on two hierarchical entities: processes and events, roughly corresponding to research studies and analysis steps within a single study. A number of sequential events can be grouped in a process building up a hierarchical structure to track patient and sample history. Each event can produce new data. Data is described by a set of user-defined metadata, and may have one or more associated files. We integrated the model in a web based digital repository with a data grid storage to manage large data sets located in geographically distinct areas. We built a graphical interface that allows authorized users to define new data types dynamically, according to their requirements. Operators compose queries on metadata fields using a flexible search interface and run them on the database and on the grid. We applied the digital repository to the integrated management of samples, patients and medical history in the BIT-Gaslini biobank. The platform currently manages 1800 samples of over 900 patients. Microarray data from 150 analyses are stored on the grid storage and replicated on two physical resources for preservation. The system is equipped with data integration capabilities with other biobanks for worldwide information sharing. CONCLUSIONS: Our data model enables users to continuously define flexible, ad hoc, and loosely structured metadata, for information sharing in specific research projects and purposes. This approach can improve sensitively interdisciplinary research collaboration and allows to track patients' clinical records, sample management information, and genomic data. The web interface allows the operators to easily manage, query, and annotate the files, without dealing with the technicalities of the data grid. BioMed Central 2014-05-06 /pmc/articles/PMC4083403/ /pubmed/25077808 http://dx.doi.org/10.1186/1471-2164-15-S3-S3 Text en Copyright © 2014 Izzo et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Izzo, Massimiliano
Mortola, Francesco
Arnulfo, Gabriele
Fato, Marco M
Varesio, Luigi
A digital repository with an extensible data model for biobanking and genomic analysis management
title A digital repository with an extensible data model for biobanking and genomic analysis management
title_full A digital repository with an extensible data model for biobanking and genomic analysis management
title_fullStr A digital repository with an extensible data model for biobanking and genomic analysis management
title_full_unstemmed A digital repository with an extensible data model for biobanking and genomic analysis management
title_short A digital repository with an extensible data model for biobanking and genomic analysis management
title_sort digital repository with an extensible data model for biobanking and genomic analysis management
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083403/
https://www.ncbi.nlm.nih.gov/pubmed/25077808
http://dx.doi.org/10.1186/1471-2164-15-S3-S3
work_keys_str_mv AT izzomassimiliano adigitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement
AT mortolafrancesco adigitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement
AT arnulfogabriele adigitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement
AT fatomarcom adigitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement
AT varesioluigi adigitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement
AT izzomassimiliano digitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement
AT mortolafrancesco digitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement
AT arnulfogabriele digitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement
AT fatomarcom digitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement
AT varesioluigi digitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement