Cargando…
Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC
Background High-quality clinical data and biological specimens are key for medical research and personalized medicine. The Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium (BBMRI-ERIC) aims to facilitate access to such biological resources. T...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Georg Thieme Verlag KG
2019
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6739205/ https://www.ncbi.nlm.nih.gov/pubmed/31509880 http://dx.doi.org/10.1055/s-0039-1695793 |
_version_ | 1783450900589707264 |
---|---|
author | Mate, Sebastian Kampf, Marvin Rödle, Wolfgang Kraus, Stefan Proynova, Rumyana Silander, Kaisa Ebert, Lars Lablans, Martin Schüttler, Christina Knell, Christian Eklund, Niina Hummel, Michael Holub, Petr Prokosch, Hans-Ulrich |
author_facet | Mate, Sebastian Kampf, Marvin Rödle, Wolfgang Kraus, Stefan Proynova, Rumyana Silander, Kaisa Ebert, Lars Lablans, Martin Schüttler, Christina Knell, Christian Eklund, Niina Hummel, Michael Holub, Petr Prokosch, Hans-Ulrich |
author_sort | Mate, Sebastian |
collection | PubMed |
description | Background High-quality clinical data and biological specimens are key for medical research and personalized medicine. The Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium (BBMRI-ERIC) aims to facilitate access to such biological resources. The accompanying ADOPT BBMRI-ERIC project kick-started BBMRI-ERIC by collecting colorectal cancer data from European biobanks. Objectives To transform these data into a common representation, a uniform approach for data integration and harmonization had to be developed. This article describes the design and the implementation of a toolset for this task. Methods Based on the semantics of a metadata repository, we developed a lexical bag-of-words matcher, capable of semiautomatically mapping local biobank terms to the central ADOPT BBMRI-ERIC terminology. Its algorithm supports fuzzy matching, utilization of synonyms, and sentiment tagging. To process the anonymized instance data based on these mappings, we also developed a data transformation application. Results The implementation was used to process the data from 10 European biobanks. The lexical matcher automatically and correctly mapped 78.48% of the 1,492 local biobank terms, and human experts were able to complete the remaining mappings. We used the expert-curated mappings to successfully process 147,608 data records from 3,415 patients. Conclusion A generic harmonization approach was created and successfully used for cross-institutional data harmonization across 10 European biobanks. The software tools were made available as open source. |
format | Online Article Text |
id | pubmed-6739205 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Georg Thieme Verlag KG |
record_format | MEDLINE/PubMed |
spelling | pubmed-67392052020-08-01 Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC Mate, Sebastian Kampf, Marvin Rödle, Wolfgang Kraus, Stefan Proynova, Rumyana Silander, Kaisa Ebert, Lars Lablans, Martin Schüttler, Christina Knell, Christian Eklund, Niina Hummel, Michael Holub, Petr Prokosch, Hans-Ulrich Appl Clin Inform Background High-quality clinical data and biological specimens are key for medical research and personalized medicine. The Biobanking and Biomolecular Resources Research Infrastructure-European Research Infrastructure Consortium (BBMRI-ERIC) aims to facilitate access to such biological resources. The accompanying ADOPT BBMRI-ERIC project kick-started BBMRI-ERIC by collecting colorectal cancer data from European biobanks. Objectives To transform these data into a common representation, a uniform approach for data integration and harmonization had to be developed. This article describes the design and the implementation of a toolset for this task. Methods Based on the semantics of a metadata repository, we developed a lexical bag-of-words matcher, capable of semiautomatically mapping local biobank terms to the central ADOPT BBMRI-ERIC terminology. Its algorithm supports fuzzy matching, utilization of synonyms, and sentiment tagging. To process the anonymized instance data based on these mappings, we also developed a data transformation application. Results The implementation was used to process the data from 10 European biobanks. The lexical matcher automatically and correctly mapped 78.48% of the 1,492 local biobank terms, and human experts were able to complete the remaining mappings. We used the expert-curated mappings to successfully process 147,608 data records from 3,415 patients. Conclusion A generic harmonization approach was created and successfully used for cross-institutional data harmonization across 10 European biobanks. The software tools were made available as open source. Georg Thieme Verlag KG 2019-08 2019-09-11 /pmc/articles/PMC6739205/ /pubmed/31509880 http://dx.doi.org/10.1055/s-0039-1695793 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License, which permits unrestricted reproduction and distribution, for non-commercial purposes only; and use and reproduction, but not distribution, of adapted material for non-commercial purposes only, provided the original work is properly cited. |
spellingShingle | Mate, Sebastian Kampf, Marvin Rödle, Wolfgang Kraus, Stefan Proynova, Rumyana Silander, Kaisa Ebert, Lars Lablans, Martin Schüttler, Christina Knell, Christian Eklund, Niina Hummel, Michael Holub, Petr Prokosch, Hans-Ulrich Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC |
title | Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC |
title_full | Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC |
title_fullStr | Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC |
title_full_unstemmed | Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC |
title_short | Pan-European Data Harmonization for Biobanks in ADOPT BBMRI-ERIC |
title_sort | pan-european data harmonization for biobanks in adopt bbmri-eric |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6739205/ https://www.ncbi.nlm.nih.gov/pubmed/31509880 http://dx.doi.org/10.1055/s-0039-1695793 |
work_keys_str_mv | AT matesebastian paneuropeandataharmonizationforbiobanksinadoptbbmrieric AT kampfmarvin paneuropeandataharmonizationforbiobanksinadoptbbmrieric AT rodlewolfgang paneuropeandataharmonizationforbiobanksinadoptbbmrieric AT krausstefan paneuropeandataharmonizationforbiobanksinadoptbbmrieric AT proynovarumyana paneuropeandataharmonizationforbiobanksinadoptbbmrieric AT silanderkaisa paneuropeandataharmonizationforbiobanksinadoptbbmrieric AT ebertlars paneuropeandataharmonizationforbiobanksinadoptbbmrieric AT lablansmartin paneuropeandataharmonizationforbiobanksinadoptbbmrieric AT schuttlerchristina paneuropeandataharmonizationforbiobanksinadoptbbmrieric AT knellchristian paneuropeandataharmonizationforbiobanksinadoptbbmrieric AT eklundniina paneuropeandataharmonizationforbiobanksinadoptbbmrieric AT hummelmichael paneuropeandataharmonizationforbiobanksinadoptbbmrieric AT holubpetr paneuropeandataharmonizationforbiobanksinadoptbbmrieric AT prokoschhansulrich paneuropeandataharmonizationforbiobanksinadoptbbmrieric |