Cargando…

Pooling individual participant data from randomized controlled trials: Exploring potential loss of information

BACKGROUND: Pooling individual participant data to enable pooled analyses is often complicated by diversity in variables across available datasets. Therefore, recoding original variables is often necessary to build a pooled dataset. We aimed to quantify how much information is lost in this process a...

Descripción completa

Detalles Bibliográficos
Autores principales: van Wanrooij, Lennard L., Hoevenaar-Blom, Marieke P., Coley, Nicola, Ngandu, Tiia, Meiller, Yannick, Guillemont, Juliette, Rosenberg, Anna, Beishuizen, Cathrien R. L., Moll van Charante, Eric P., Soininen, Hilkka, Brayne, Carol, Andrieu, Sandrine, Kivipelto, Miia, Richard, Edo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7217432/
https://www.ncbi.nlm.nih.gov/pubmed/32396543
http://dx.doi.org/10.1371/journal.pone.0232970
_version_ 1783532596608630784
author van Wanrooij, Lennard L.
Hoevenaar-Blom, Marieke P.
Coley, Nicola
Ngandu, Tiia
Meiller, Yannick
Guillemont, Juliette
Rosenberg, Anna
Beishuizen, Cathrien R. L.
Moll van Charante, Eric P.
Soininen, Hilkka
Brayne, Carol
Andrieu, Sandrine
Kivipelto, Miia
Richard, Edo
author_facet van Wanrooij, Lennard L.
Hoevenaar-Blom, Marieke P.
Coley, Nicola
Ngandu, Tiia
Meiller, Yannick
Guillemont, Juliette
Rosenberg, Anna
Beishuizen, Cathrien R. L.
Moll van Charante, Eric P.
Soininen, Hilkka
Brayne, Carol
Andrieu, Sandrine
Kivipelto, Miia
Richard, Edo
author_sort van Wanrooij, Lennard L.
collection PubMed
description BACKGROUND: Pooling individual participant data to enable pooled analyses is often complicated by diversity in variables across available datasets. Therefore, recoding original variables is often necessary to build a pooled dataset. We aimed to quantify how much information is lost in this process and to what extent this jeopardizes validity of analyses results. METHODS: Data were derived from a platform that was developed to pool data from three randomized controlled trials on the effect of treatment of cardiovascular risk factors on cognitive decline or dementia. We quantified loss of information using the R-squared of linear regression models with pooled variables as a function of their original variable(s). In case the R-squared was below 0.8, we additionally explored the potential impact of loss of information for future analyses. We did this second step by comparing whether the Beta coefficient of the predictor differed more than 10% when adding original or recoded variables as a confounder in a linear regression model. In a simulation we randomly sampled numbers, recoded those < = 1000 to 0 and those >1000 to 1 and varied the range of the continuous variable, the ratio of recoded zeroes to recoded ones, or both, and again extracted the R-squared from linear models to quantify information loss. RESULTS: The R-squared was below 0.8 for 8 out of 91 recoded variables. In 4 cases this had a substantial impact on the regression models, particularly when a continuous variable was recoded into a discrete variable. Our simulation showed that the least information is lost when the ratio of recoded zeroes to ones is 1:1. CONCLUSIONS: Large, pooled datasets provide great opportunities, justifying the efforts for data harmonization. Still, caution is warranted when using recoded variables which variance is explained limitedly by their original variables as this may jeopardize the validity of study results.
format Online
Article
Text
id pubmed-7217432
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-72174322020-05-26 Pooling individual participant data from randomized controlled trials: Exploring potential loss of information van Wanrooij, Lennard L. Hoevenaar-Blom, Marieke P. Coley, Nicola Ngandu, Tiia Meiller, Yannick Guillemont, Juliette Rosenberg, Anna Beishuizen, Cathrien R. L. Moll van Charante, Eric P. Soininen, Hilkka Brayne, Carol Andrieu, Sandrine Kivipelto, Miia Richard, Edo PLoS One Research Article BACKGROUND: Pooling individual participant data to enable pooled analyses is often complicated by diversity in variables across available datasets. Therefore, recoding original variables is often necessary to build a pooled dataset. We aimed to quantify how much information is lost in this process and to what extent this jeopardizes validity of analyses results. METHODS: Data were derived from a platform that was developed to pool data from three randomized controlled trials on the effect of treatment of cardiovascular risk factors on cognitive decline or dementia. We quantified loss of information using the R-squared of linear regression models with pooled variables as a function of their original variable(s). In case the R-squared was below 0.8, we additionally explored the potential impact of loss of information for future analyses. We did this second step by comparing whether the Beta coefficient of the predictor differed more than 10% when adding original or recoded variables as a confounder in a linear regression model. In a simulation we randomly sampled numbers, recoded those < = 1000 to 0 and those >1000 to 1 and varied the range of the continuous variable, the ratio of recoded zeroes to recoded ones, or both, and again extracted the R-squared from linear models to quantify information loss. RESULTS: The R-squared was below 0.8 for 8 out of 91 recoded variables. In 4 cases this had a substantial impact on the regression models, particularly when a continuous variable was recoded into a discrete variable. Our simulation showed that the least information is lost when the ratio of recoded zeroes to ones is 1:1. CONCLUSIONS: Large, pooled datasets provide great opportunities, justifying the efforts for data harmonization. Still, caution is warranted when using recoded variables which variance is explained limitedly by their original variables as this may jeopardize the validity of study results. Public Library of Science 2020-05-12 /pmc/articles/PMC7217432/ /pubmed/32396543 http://dx.doi.org/10.1371/journal.pone.0232970 Text en © 2020 van Wanrooij et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
van Wanrooij, Lennard L.
Hoevenaar-Blom, Marieke P.
Coley, Nicola
Ngandu, Tiia
Meiller, Yannick
Guillemont, Juliette
Rosenberg, Anna
Beishuizen, Cathrien R. L.
Moll van Charante, Eric P.
Soininen, Hilkka
Brayne, Carol
Andrieu, Sandrine
Kivipelto, Miia
Richard, Edo
Pooling individual participant data from randomized controlled trials: Exploring potential loss of information
title Pooling individual participant data from randomized controlled trials: Exploring potential loss of information
title_full Pooling individual participant data from randomized controlled trials: Exploring potential loss of information
title_fullStr Pooling individual participant data from randomized controlled trials: Exploring potential loss of information
title_full_unstemmed Pooling individual participant data from randomized controlled trials: Exploring potential loss of information
title_short Pooling individual participant data from randomized controlled trials: Exploring potential loss of information
title_sort pooling individual participant data from randomized controlled trials: exploring potential loss of information
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7217432/
https://www.ncbi.nlm.nih.gov/pubmed/32396543
http://dx.doi.org/10.1371/journal.pone.0232970
work_keys_str_mv AT vanwanrooijlennardl poolingindividualparticipantdatafromrandomizedcontrolledtrialsexploringpotentiallossofinformation
AT hoevenaarblommariekep poolingindividualparticipantdatafromrandomizedcontrolledtrialsexploringpotentiallossofinformation
AT coleynicola poolingindividualparticipantdatafromrandomizedcontrolledtrialsexploringpotentiallossofinformation
AT ngandutiia poolingindividualparticipantdatafromrandomizedcontrolledtrialsexploringpotentiallossofinformation
AT meilleryannick poolingindividualparticipantdatafromrandomizedcontrolledtrialsexploringpotentiallossofinformation
AT guillemontjuliette poolingindividualparticipantdatafromrandomizedcontrolledtrialsexploringpotentiallossofinformation
AT rosenberganna poolingindividualparticipantdatafromrandomizedcontrolledtrialsexploringpotentiallossofinformation
AT beishuizencathrienrl poolingindividualparticipantdatafromrandomizedcontrolledtrialsexploringpotentiallossofinformation
AT mollvancharanteericp poolingindividualparticipantdatafromrandomizedcontrolledtrialsexploringpotentiallossofinformation
AT soininenhilkka poolingindividualparticipantdatafromrandomizedcontrolledtrialsexploringpotentiallossofinformation
AT braynecarol poolingindividualparticipantdatafromrandomizedcontrolledtrialsexploringpotentiallossofinformation
AT andrieusandrine poolingindividualparticipantdatafromrandomizedcontrolledtrialsexploringpotentiallossofinformation
AT kivipeltomiia poolingindividualparticipantdatafromrandomizedcontrolledtrialsexploringpotentiallossofinformation
AT richardedo poolingindividualparticipantdatafromrandomizedcontrolledtrialsexploringpotentiallossofinformation