Cargando…

Evaluating the harmonisation potential of diverse cohort datasets

Data discovery, the ability to find datasets relevant to an analysis, increases scientific opportunity, improves rigour and accelerates activity. Rapid growth in the depth, breadth, quantity and availability of data provides unprecedented opportunities and challenges for data discovery. A potential...

Descripción completa

Detalles Bibliográficos
Autores principales: Bauermeister, Sarah, Phatak, Mukta, Sparks, Kelly, Sargent, Lana, Griswold, Michael, McHugh, Caitlin, Nalls, Mike, Young, Simon, Bauermeister, Joshua, Elliott, Paul, Steptoe, Andrew, Porteous, David, Dufouil, Carole, Gallacher, John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Netherlands 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10232583/
https://www.ncbi.nlm.nih.gov/pubmed/37099244
http://dx.doi.org/10.1007/s10654-023-00997-3
_version_ 1785052013263847424
author Bauermeister, Sarah
Phatak, Mukta
Sparks, Kelly
Sargent, Lana
Griswold, Michael
McHugh, Caitlin
Nalls, Mike
Young, Simon
Bauermeister, Joshua
Elliott, Paul
Steptoe, Andrew
Porteous, David
Dufouil, Carole
Gallacher, John
author_facet Bauermeister, Sarah
Phatak, Mukta
Sparks, Kelly
Sargent, Lana
Griswold, Michael
McHugh, Caitlin
Nalls, Mike
Young, Simon
Bauermeister, Joshua
Elliott, Paul
Steptoe, Andrew
Porteous, David
Dufouil, Carole
Gallacher, John
author_sort Bauermeister, Sarah
collection PubMed
description Data discovery, the ability to find datasets relevant to an analysis, increases scientific opportunity, improves rigour and accelerates activity. Rapid growth in the depth, breadth, quantity and availability of data provides unprecedented opportunities and challenges for data discovery. A potential tool for increasing the efficiency of data discovery, particularly across multiple datasets is data harmonisation.A set of 124 variables, identified as being of broad interest to neurodegeneration, were harmonised using the C-Surv data model. Harmonisation strategies used were simple calibration, algorithmic transformation and standardisation to the Z-distribution. Widely used data conventions, optimised for inclusiveness rather than aetiological precision, were used as harmonisation rules. The harmonisation scheme was applied to data from four diverse population cohorts.Of the 120 variables that were found in the datasets, correspondence between the harmonised data schema and cohort-specific data models was complete or close for 111 (93%). For the remainder, harmonisation was possible with a marginal a loss of granularity.Although harmonisation is not an exact science, sufficient comparability across datasets was achieved to enable data discovery with relatively little loss of informativeness. This provides a basis for further work extending harmonisation to a larger variable list, applying the harmonisation to further datasets, and incentivising the development of data discovery tools. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10654-023-00997-3.
format Online
Article
Text
id pubmed-10232583
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer Netherlands
record_format MEDLINE/PubMed
spelling pubmed-102325832023-06-02 Evaluating the harmonisation potential of diverse cohort datasets Bauermeister, Sarah Phatak, Mukta Sparks, Kelly Sargent, Lana Griswold, Michael McHugh, Caitlin Nalls, Mike Young, Simon Bauermeister, Joshua Elliott, Paul Steptoe, Andrew Porteous, David Dufouil, Carole Gallacher, John Eur J Epidemiol Methods Data discovery, the ability to find datasets relevant to an analysis, increases scientific opportunity, improves rigour and accelerates activity. Rapid growth in the depth, breadth, quantity and availability of data provides unprecedented opportunities and challenges for data discovery. A potential tool for increasing the efficiency of data discovery, particularly across multiple datasets is data harmonisation.A set of 124 variables, identified as being of broad interest to neurodegeneration, were harmonised using the C-Surv data model. Harmonisation strategies used were simple calibration, algorithmic transformation and standardisation to the Z-distribution. Widely used data conventions, optimised for inclusiveness rather than aetiological precision, were used as harmonisation rules. The harmonisation scheme was applied to data from four diverse population cohorts.Of the 120 variables that were found in the datasets, correspondence between the harmonised data schema and cohort-specific data models was complete or close for 111 (93%). For the remainder, harmonisation was possible with a marginal a loss of granularity.Although harmonisation is not an exact science, sufficient comparability across datasets was achieved to enable data discovery with relatively little loss of informativeness. This provides a basis for further work extending harmonisation to a larger variable list, applying the harmonisation to further datasets, and incentivising the development of data discovery tools. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10654-023-00997-3. Springer Netherlands 2023-04-26 2023 /pmc/articles/PMC10232583/ /pubmed/37099244 http://dx.doi.org/10.1007/s10654-023-00997-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Methods
Bauermeister, Sarah
Phatak, Mukta
Sparks, Kelly
Sargent, Lana
Griswold, Michael
McHugh, Caitlin
Nalls, Mike
Young, Simon
Bauermeister, Joshua
Elliott, Paul
Steptoe, Andrew
Porteous, David
Dufouil, Carole
Gallacher, John
Evaluating the harmonisation potential of diverse cohort datasets
title Evaluating the harmonisation potential of diverse cohort datasets
title_full Evaluating the harmonisation potential of diverse cohort datasets
title_fullStr Evaluating the harmonisation potential of diverse cohort datasets
title_full_unstemmed Evaluating the harmonisation potential of diverse cohort datasets
title_short Evaluating the harmonisation potential of diverse cohort datasets
title_sort evaluating the harmonisation potential of diverse cohort datasets
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10232583/
https://www.ncbi.nlm.nih.gov/pubmed/37099244
http://dx.doi.org/10.1007/s10654-023-00997-3
work_keys_str_mv AT bauermeistersarah evaluatingtheharmonisationpotentialofdiversecohortdatasets
AT phatakmukta evaluatingtheharmonisationpotentialofdiversecohortdatasets
AT sparkskelly evaluatingtheharmonisationpotentialofdiversecohortdatasets
AT sargentlana evaluatingtheharmonisationpotentialofdiversecohortdatasets
AT griswoldmichael evaluatingtheharmonisationpotentialofdiversecohortdatasets
AT mchughcaitlin evaluatingtheharmonisationpotentialofdiversecohortdatasets
AT nallsmike evaluatingtheharmonisationpotentialofdiversecohortdatasets
AT youngsimon evaluatingtheharmonisationpotentialofdiversecohortdatasets
AT bauermeisterjoshua evaluatingtheharmonisationpotentialofdiversecohortdatasets
AT elliottpaul evaluatingtheharmonisationpotentialofdiversecohortdatasets
AT steptoeandrew evaluatingtheharmonisationpotentialofdiversecohortdatasets
AT porteousdavid evaluatingtheharmonisationpotentialofdiversecohortdatasets
AT dufouilcarole evaluatingtheharmonisationpotentialofdiversecohortdatasets
AT gallacherjohn evaluatingtheharmonisationpotentialofdiversecohortdatasets