Cargando…
Data harmonization and federated analysis of population-based studies: the BioSHaRE project
ABSTRACTS: BACKGROUND: Individual-level data pooling of large population-based studies across research centres in international research projects faces many hurdles. The BioSHaRE (Biobank Standardisation and Harmonisation for Research Excellence in the European Union) project aims to address these i...
Autores principales: | , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4175511/ https://www.ncbi.nlm.nih.gov/pubmed/24257327 http://dx.doi.org/10.1186/1742-7622-10-12 |
_version_ | 1782336496895787008 |
---|---|
author | Doiron, Dany Burton, Paul Marcon, Yannick Gaye, Amadou Wolffenbuttel, Bruce H R Perola, Markus Stolk, Ronald P Foco, Luisa Minelli, Cosetta Waldenberger, Melanie Holle, Rolf Kvaløy, Kirsti Hillege, Hans L Tassé, Anne-Marie Ferretti, Vincent Fortier, Isabel |
author_facet | Doiron, Dany Burton, Paul Marcon, Yannick Gaye, Amadou Wolffenbuttel, Bruce H R Perola, Markus Stolk, Ronald P Foco, Luisa Minelli, Cosetta Waldenberger, Melanie Holle, Rolf Kvaløy, Kirsti Hillege, Hans L Tassé, Anne-Marie Ferretti, Vincent Fortier, Isabel |
author_sort | Doiron, Dany |
collection | PubMed |
description | ABSTRACTS: BACKGROUND: Individual-level data pooling of large population-based studies across research centres in international research projects faces many hurdles. The BioSHaRE (Biobank Standardisation and Harmonisation for Research Excellence in the European Union) project aims to address these issues by building a collaborative group of investigators and developing tools for data harmonization, database integration and federated data analyses. METHODS: Eight population-based studies in six European countries were recruited to participate in the BioSHaRE project. Through workshops, teleconferences and electronic communications, participating investigators identified a set of 96 variables targeted for harmonization to answer research questions of interest. Using each study’s questionnaires, standard operating procedures, and data dictionaries, harmonization potential was assessed. Whenever harmonization was deemed possible, processing algorithms were developed and implemented in an open-source software infrastructure to transform study-specific data into the target (i.e. harmonized) format. Harmonized datasets located on server in each research centres across Europe were interconnected through a federated database system to perform statistical analysis. RESULTS: Retrospective harmonization led to the generation of common format variables for 73% of matches considered (96 targeted variables across 8 studies). Authenticated investigators can now perform complex statistical analyses of harmonized datasets stored on distributed servers without actually sharing individual-level data using the DataSHIELD method. CONCLUSION: New Internet-based networking technologies and database management systems are providing the means to support collaborative, multi-center research in an efficient and secure manner. The results from this pilot project show that, given a strong collaborative relationship between participating studies, it is possible to seamlessly co-analyse internationally harmonized research databases while allowing each study to retain full control over individual-level data. We encourage additional collaborative research networks in epidemiology, public health, and the social sciences to make use of the open source tools presented herein. |
format | Online Article Text |
id | pubmed-4175511 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-41755112014-09-27 Data harmonization and federated analysis of population-based studies: the BioSHaRE project Doiron, Dany Burton, Paul Marcon, Yannick Gaye, Amadou Wolffenbuttel, Bruce H R Perola, Markus Stolk, Ronald P Foco, Luisa Minelli, Cosetta Waldenberger, Melanie Holle, Rolf Kvaløy, Kirsti Hillege, Hans L Tassé, Anne-Marie Ferretti, Vincent Fortier, Isabel Emerg Themes Epidemiol Analytic Perspective ABSTRACTS: BACKGROUND: Individual-level data pooling of large population-based studies across research centres in international research projects faces many hurdles. The BioSHaRE (Biobank Standardisation and Harmonisation for Research Excellence in the European Union) project aims to address these issues by building a collaborative group of investigators and developing tools for data harmonization, database integration and federated data analyses. METHODS: Eight population-based studies in six European countries were recruited to participate in the BioSHaRE project. Through workshops, teleconferences and electronic communications, participating investigators identified a set of 96 variables targeted for harmonization to answer research questions of interest. Using each study’s questionnaires, standard operating procedures, and data dictionaries, harmonization potential was assessed. Whenever harmonization was deemed possible, processing algorithms were developed and implemented in an open-source software infrastructure to transform study-specific data into the target (i.e. harmonized) format. Harmonized datasets located on server in each research centres across Europe were interconnected through a federated database system to perform statistical analysis. RESULTS: Retrospective harmonization led to the generation of common format variables for 73% of matches considered (96 targeted variables across 8 studies). Authenticated investigators can now perform complex statistical analyses of harmonized datasets stored on distributed servers without actually sharing individual-level data using the DataSHIELD method. CONCLUSION: New Internet-based networking technologies and database management systems are providing the means to support collaborative, multi-center research in an efficient and secure manner. The results from this pilot project show that, given a strong collaborative relationship between participating studies, it is possible to seamlessly co-analyse internationally harmonized research databases while allowing each study to retain full control over individual-level data. We encourage additional collaborative research networks in epidemiology, public health, and the social sciences to make use of the open source tools presented herein. BioMed Central 2013-11-21 /pmc/articles/PMC4175511/ /pubmed/24257327 http://dx.doi.org/10.1186/1742-7622-10-12 Text en Copyright © 2013 Doiron et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Analytic Perspective Doiron, Dany Burton, Paul Marcon, Yannick Gaye, Amadou Wolffenbuttel, Bruce H R Perola, Markus Stolk, Ronald P Foco, Luisa Minelli, Cosetta Waldenberger, Melanie Holle, Rolf Kvaløy, Kirsti Hillege, Hans L Tassé, Anne-Marie Ferretti, Vincent Fortier, Isabel Data harmonization and federated analysis of population-based studies: the BioSHaRE project |
title | Data harmonization and federated analysis of population-based studies: the BioSHaRE project |
title_full | Data harmonization and federated analysis of population-based studies: the BioSHaRE project |
title_fullStr | Data harmonization and federated analysis of population-based studies: the BioSHaRE project |
title_full_unstemmed | Data harmonization and federated analysis of population-based studies: the BioSHaRE project |
title_short | Data harmonization and federated analysis of population-based studies: the BioSHaRE project |
title_sort | data harmonization and federated analysis of population-based studies: the bioshare project |
topic | Analytic Perspective |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4175511/ https://www.ncbi.nlm.nih.gov/pubmed/24257327 http://dx.doi.org/10.1186/1742-7622-10-12 |
work_keys_str_mv | AT doirondany dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject AT burtonpaul dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject AT marconyannick dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject AT gayeamadou dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject AT wolffenbuttelbrucehr dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject AT perolamarkus dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject AT stolkronaldp dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject AT focoluisa dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject AT minellicosetta dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject AT waldenbergermelanie dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject AT hollerolf dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject AT kvaløykirsti dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject AT hillegehansl dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject AT tasseannemarie dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject AT ferrettivincent dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject AT fortierisabel dataharmonizationandfederatedanalysisofpopulationbasedstudiesthebioshareproject |