Cargando…

Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC

Background  High-quality clinical data and biological specimens are key for medical research and personalized medicine. The Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium (BBMRI-ERIC) aims to facilitate access to such biological resources. T...

Descripción completa

Detalles Bibliográficos
Autores principales: Mate, Sebastian, Kampf, Marvin, Rödle, Wolfgang, Kraus, Stefan, Proynova, Rumyana, Silander, Kaisa, Ebert, Lars, Lablans, Martin, Schüttler, Christina, Knell, Christian, Eklund, Niina, Hummel, Michael, Holub, Petr, Prokosch, Hans-Ulrich
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Georg Thieme Verlag KG 2019
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6739205/
https://www.ncbi.nlm.nih.gov/pubmed/31509880
http://dx.doi.org/10.1055/s-0039-1695793
_version_ 1783450900589707264
author Mate, Sebastian
Kampf, Marvin
Rödle, Wolfgang
Kraus, Stefan
Proynova, Rumyana
Silander, Kaisa
Ebert, Lars
Lablans, Martin
Schüttler, Christina
Knell, Christian
Eklund, Niina
Hummel, Michael
Holub, Petr
Prokosch, Hans-Ulrich
author_facet Mate, Sebastian
Kampf, Marvin
Rödle, Wolfgang
Kraus, Stefan
Proynova, Rumyana
Silander, Kaisa
Ebert, Lars
Lablans, Martin
Schüttler, Christina
Knell, Christian
Eklund, Niina
Hummel, Michael
Holub, Petr
Prokosch, Hans-Ulrich
author_sort Mate, Sebastian
collection PubMed
description Background  High-quality clinical data and biological specimens are key for medical research and personalized medicine. The Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium (BBMRI-ERIC) aims to facilitate access to such biological resources. The accompanying ADOPT BBMRI-ERIC project kick-started BBMRI-ERIC by collecting colorectal cancer data from European biobanks. Objectives  To transform these data into a common representation, a uniform approach for data integration and harmonization had to be developed. This article describes the design and the implementation of a toolset for this task. Methods  Based on the semantics of a metadata repository, we developed a lexical bag-of-words matcher, capable of semiautomatically mapping local biobank terms to the central ADOPT BBMRI-ERIC terminology. Its algorithm supports fuzzy matching, utilization of synonyms, and sentiment tagging. To process the anonymized instance data based on these mappings, we also developed a data transformation application. Results  The implementation was used to process the data from 10 European biobanks. The lexical matcher automatically and correctly mapped 78.48% of the 1,492 local biobank terms, and human experts were able to complete the remaining mappings. We used the expert-curated mappings to successfully process 147,608 data records from 3,415 patients. Conclusion  A generic harmonization approach was created and successfully used for cross-institutional data harmonization across 10 European biobanks. The software tools were made available as open source.
format Online
Article
Text
id pubmed-6739205
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Georg Thieme Verlag KG
record_format MEDLINE/PubMed
spelling pubmed-67392052020-08-01 Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC Mate, Sebastian Kampf, Marvin Rödle, Wolfgang Kraus, Stefan Proynova, Rumyana Silander, Kaisa Ebert, Lars Lablans, Martin Schüttler, Christina Knell, Christian Eklund, Niina Hummel, Michael Holub, Petr Prokosch, Hans-Ulrich Appl Clin Inform Background  High-quality clinical data and biological specimens are key for medical research and personalized medicine. The Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium (BBMRI-ERIC) aims to facilitate access to such biological resources. The accompanying ADOPT BBMRI-ERIC project kick-started BBMRI-ERIC by collecting colorectal cancer data from European biobanks. Objectives  To transform these data into a common representation, a uniform approach for data integration and harmonization had to be developed. This article describes the design and the implementation of a toolset for this task. Methods  Based on the semantics of a metadata repository, we developed a lexical bag-of-words matcher, capable of semiautomatically mapping local biobank terms to the central ADOPT BBMRI-ERIC terminology. Its algorithm supports fuzzy matching, utilization of synonyms, and sentiment tagging. To process the anonymized instance data based on these mappings, we also developed a data transformation application. Results  The implementation was used to process the data from 10 European biobanks. The lexical matcher automatically and correctly mapped 78.48% of the 1,492 local biobank terms, and human experts were able to complete the remaining mappings. We used the expert-curated mappings to successfully process 147,608 data records from 3,415 patients. Conclusion  A generic harmonization approach was created and successfully used for cross-institutional data harmonization across 10 European biobanks. The software tools were made available as open source. Georg Thieme Verlag KG 2019-08 2019-09-11 /pmc/articles/PMC6739205/ /pubmed/31509880 http://dx.doi.org/10.1055/s-0039-1695793 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License, which permits unrestricted reproduction and distribution, for non-commercial purposes only; and use and reproduction, but not distribution, of adapted material for non-commercial purposes only, provided the original work is properly cited.
spellingShingle Mate, Sebastian
Kampf, Marvin
Rödle, Wolfgang
Kraus, Stefan
Proynova, Rumyana
Silander, Kaisa
Ebert, Lars
Lablans, Martin
Schüttler, Christina
Knell, Christian
Eklund, Niina
Hummel, Michael
Holub, Petr
Prokosch, Hans-Ulrich
Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC
title Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC
title_full Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC
title_fullStr Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC
title_full_unstemmed Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC
title_short Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC
title_sort pan-european data harmonization for biobanks in adopt bbmri-eric
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6739205/
https://www.ncbi.nlm.nih.gov/pubmed/31509880
http://dx.doi.org/10.1055/s-0039-1695793
work_keys_str_mv AT matesebastian paneuropeandataharmonizationforbiobanksinadoptbbmrieric
AT kampfmarvin paneuropeandataharmonizationforbiobanksinadoptbbmrieric
AT rodlewolfgang paneuropeandataharmonizationforbiobanksinadoptbbmrieric
AT krausstefan paneuropeandataharmonizationforbiobanksinadoptbbmrieric
AT proynovarumyana paneuropeandataharmonizationforbiobanksinadoptbbmrieric
AT silanderkaisa paneuropeandataharmonizationforbiobanksinadoptbbmrieric
AT ebertlars paneuropeandataharmonizationforbiobanksinadoptbbmrieric
AT lablansmartin paneuropeandataharmonizationforbiobanksinadoptbbmrieric
AT schuttlerchristina paneuropeandataharmonizationforbiobanksinadoptbbmrieric
AT knellchristian paneuropeandataharmonizationforbiobanksinadoptbbmrieric
AT eklundniina paneuropeandataharmonizationforbiobanksinadoptbbmrieric
AT hummelmichael paneuropeandataharmonizationforbiobanksinadoptbbmrieric
AT holubpetr paneuropeandataharmonizationforbiobanksinadoptbbmrieric
AT prokoschhansulrich paneuropeandataharmonizationforbiobanksinadoptbbmrieric