Cargando…
A digital repository with an extensible data model for biobanking and genomic analysis management
MOTIVATION: Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standa...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083403/ https://www.ncbi.nlm.nih.gov/pubmed/25077808 http://dx.doi.org/10.1186/1471-2164-15-S3-S3 |
_version_ | 1782324373137391616 |
---|---|
author | Izzo, Massimiliano Mortola, Francesco Arnulfo, Gabriele Fato, Marco M Varesio, Luigi |
author_facet | Izzo, Massimiliano Mortola, Francesco Arnulfo, Gabriele Fato, Marco M Varesio, Luigi |
author_sort | Izzo, Massimiliano |
collection | PubMed |
description | MOTIVATION: Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standardization is not feasible and it becomes crucial to develop digital repositories with flexible and extensible data models, as in the case of modern integrated biobanks management. RESULTS: We developed a novel data model in JSON format to describe heterogeneous data in a generic biomedical science scenario. The model is built on two hierarchical entities: processes and events, roughly corresponding to research studies and analysis steps within a single study. A number of sequential events can be grouped in a process building up a hierarchical structure to track patient and sample history. Each event can produce new data. Data is described by a set of user-defined metadata, and may have one or more associated files. We integrated the model in a web based digital repository with a data grid storage to manage large data sets located in geographically distinct areas. We built a graphical interface that allows authorized users to define new data types dynamically, according to their requirements. Operators compose queries on metadata fields using a flexible search interface and run them on the database and on the grid. We applied the digital repository to the integrated management of samples, patients and medical history in the BIT-Gaslini biobank. The platform currently manages 1800 samples of over 900 patients. Microarray data from 150 analyses are stored on the grid storage and replicated on two physical resources for preservation. The system is equipped with data integration capabilities with other biobanks for worldwide information sharing. CONCLUSIONS: Our data model enables users to continuously define flexible, ad hoc, and loosely structured metadata, for information sharing in specific research projects and purposes. This approach can improve sensitively interdisciplinary research collaboration and allows to track patients' clinical records, sample management information, and genomic data. The web interface allows the operators to easily manage, query, and annotate the files, without dealing with the technicalities of the data grid. |
format | Online Article Text |
id | pubmed-4083403 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40834032014-07-18 A digital repository with an extensible data model for biobanking and genomic analysis management Izzo, Massimiliano Mortola, Francesco Arnulfo, Gabriele Fato, Marco M Varesio, Luigi BMC Genomics Research MOTIVATION: Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standardization is not feasible and it becomes crucial to develop digital repositories with flexible and extensible data models, as in the case of modern integrated biobanks management. RESULTS: We developed a novel data model in JSON format to describe heterogeneous data in a generic biomedical science scenario. The model is built on two hierarchical entities: processes and events, roughly corresponding to research studies and analysis steps within a single study. A number of sequential events can be grouped in a process building up a hierarchical structure to track patient and sample history. Each event can produce new data. Data is described by a set of user-defined metadata, and may have one or more associated files. We integrated the model in a web based digital repository with a data grid storage to manage large data sets located in geographically distinct areas. We built a graphical interface that allows authorized users to define new data types dynamically, according to their requirements. Operators compose queries on metadata fields using a flexible search interface and run them on the database and on the grid. We applied the digital repository to the integrated management of samples, patients and medical history in the BIT-Gaslini biobank. The platform currently manages 1800 samples of over 900 patients. Microarray data from 150 analyses are stored on the grid storage and replicated on two physical resources for preservation. The system is equipped with data integration capabilities with other biobanks for worldwide information sharing. CONCLUSIONS: Our data model enables users to continuously define flexible, ad hoc, and loosely structured metadata, for information sharing in specific research projects and purposes. This approach can improve sensitively interdisciplinary research collaboration and allows to track patients' clinical records, sample management information, and genomic data. The web interface allows the operators to easily manage, query, and annotate the files, without dealing with the technicalities of the data grid. BioMed Central 2014-05-06 /pmc/articles/PMC4083403/ /pubmed/25077808 http://dx.doi.org/10.1186/1471-2164-15-S3-S3 Text en Copyright © 2014 Izzo et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Izzo, Massimiliano Mortola, Francesco Arnulfo, Gabriele Fato, Marco M Varesio, Luigi A digital repository with an extensible data model for biobanking and genomic analysis management |
title | A digital repository with an extensible data model for biobanking and genomic analysis management |
title_full | A digital repository with an extensible data model for biobanking and genomic analysis management |
title_fullStr | A digital repository with an extensible data model for biobanking and genomic analysis management |
title_full_unstemmed | A digital repository with an extensible data model for biobanking and genomic analysis management |
title_short | A digital repository with an extensible data model for biobanking and genomic analysis management |
title_sort | digital repository with an extensible data model for biobanking and genomic analysis management |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083403/ https://www.ncbi.nlm.nih.gov/pubmed/25077808 http://dx.doi.org/10.1186/1471-2164-15-S3-S3 |
work_keys_str_mv | AT izzomassimiliano adigitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement AT mortolafrancesco adigitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement AT arnulfogabriele adigitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement AT fatomarcom adigitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement AT varesioluigi adigitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement AT izzomassimiliano digitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement AT mortolafrancesco digitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement AT arnulfogabriele digitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement AT fatomarcom digitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement AT varesioluigi digitalrepositorywithanextensibledatamodelforbiobankingandgenomicanalysismanagement |