Cargando…

Automated detection of over- and under-dispersion in baseline tables in randomised controlled trials

Background: Papers describing the results of a randomised trial should include a baseline table that compares the characteristics of randomised groups. Researchers who fraudulently generate trials often unwittingly create baseline tables that are implausibly similar (under-dispersed) or have large d...

Descripción completa

Detalles Bibliográficos
Autor principal: Barnett, Adrian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10285343/
https://www.ncbi.nlm.nih.gov/pubmed/37360941
http://dx.doi.org/10.12688/f1000research.123002.2
_version_ 1785061591271604224
author Barnett, Adrian
author_facet Barnett, Adrian
author_sort Barnett, Adrian
collection PubMed
description Background: Papers describing the results of a randomised trial should include a baseline table that compares the characteristics of randomised groups. Researchers who fraudulently generate trials often unwittingly create baseline tables that are implausibly similar (under-dispersed) or have large differences between groups (over-dispersed). I aimed to create an automated algorithm to screen for under- and over-dispersion in the baseline tables of randomised trials. Methods: Using a cross-sectional study I examined 2,245 randomised controlled trials published in health and medical journals on PubMed Central. I estimated the probability that a trial's baseline summary statistics were under- or over-dispersed using a Bayesian model that examined the distribution of t-statistics for the between-group differences, and compared this with an expected distribution without dispersion. I used a simulation study to test the ability of the model to find under- or over-dispersion and compared its performance with an existing test of dispersion based on a uniform test of p-values. My model combined categorical and continuous summary statistics, whereas the uniform test used only continuous statistics. Results: The algorithm had a relatively good accuracy for extracting the data from baseline tables, matching well on the size of the tables and sample size. Using t-statistics in the Bayesian model out-performed the uniform test of p-values, which had many false positives for skewed, categorical and rounded data that were not under- or over-dispersed. For trials published on PubMed Central, some tables appeared under- or over-dispersed because they had an atypical presentation or had reporting errors. Some trials flagged as under-dispersed had groups with strikingly similar summary statistics. Conclusions: Automated screening for fraud of all submitted trials is challenging due to the widely varying presentation of baseline tables. The Bayesian model could be useful in targeted checks of suspected trials or authors.
format Online
Article
Text
id pubmed-10285343
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-102853432023-06-23 Automated detection of over- and under-dispersion in baseline tables in randomised controlled trials Barnett, Adrian F1000Res Research Article Background: Papers describing the results of a randomised trial should include a baseline table that compares the characteristics of randomised groups. Researchers who fraudulently generate trials often unwittingly create baseline tables that are implausibly similar (under-dispersed) or have large differences between groups (over-dispersed). I aimed to create an automated algorithm to screen for under- and over-dispersion in the baseline tables of randomised trials. Methods: Using a cross-sectional study I examined 2,245 randomised controlled trials published in health and medical journals on PubMed Central. I estimated the probability that a trial's baseline summary statistics were under- or over-dispersed using a Bayesian model that examined the distribution of t-statistics for the between-group differences, and compared this with an expected distribution without dispersion. I used a simulation study to test the ability of the model to find under- or over-dispersion and compared its performance with an existing test of dispersion based on a uniform test of p-values. My model combined categorical and continuous summary statistics, whereas the uniform test used only continuous statistics. Results: The algorithm had a relatively good accuracy for extracting the data from baseline tables, matching well on the size of the tables and sample size. Using t-statistics in the Bayesian model out-performed the uniform test of p-values, which had many false positives for skewed, categorical and rounded data that were not under- or over-dispersed. For trials published on PubMed Central, some tables appeared under- or over-dispersed because they had an atypical presentation or had reporting errors. Some trials flagged as under-dispersed had groups with strikingly similar summary statistics. Conclusions: Automated screening for fraud of all submitted trials is challenging due to the widely varying presentation of baseline tables. The Bayesian model could be useful in targeted checks of suspected trials or authors. F1000 Research Limited 2023-05-30 /pmc/articles/PMC10285343/ /pubmed/37360941 http://dx.doi.org/10.12688/f1000research.123002.2 Text en Copyright: © 2023 Barnett A https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Barnett, Adrian
Automated detection of over- and under-dispersion in baseline tables in randomised controlled trials
title Automated detection of over- and under-dispersion in baseline tables in randomised controlled trials
title_full Automated detection of over- and under-dispersion in baseline tables in randomised controlled trials
title_fullStr Automated detection of over- and under-dispersion in baseline tables in randomised controlled trials
title_full_unstemmed Automated detection of over- and under-dispersion in baseline tables in randomised controlled trials
title_short Automated detection of over- and under-dispersion in baseline tables in randomised controlled trials
title_sort automated detection of over- and under-dispersion in baseline tables in randomised controlled trials
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10285343/
https://www.ncbi.nlm.nih.gov/pubmed/37360941
http://dx.doi.org/10.12688/f1000research.123002.2
work_keys_str_mv AT barnettadrian automateddetectionofoverandunderdispersioninbaselinetablesinrandomisedcontrolledtrials