Cargando…

New improved Aggregator: predicting which clinical trial articles derive from the same registered clinical trial

OBJECTIVES: To identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence. MATERIALS AND METHODS: We updated our previous model by creating larger, more recent, and more diverse positive and neg...

Descripción completa

Detalles Bibliográficos
Autores principales: Smalheiser, Neil R, Holt, Arthur W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7660960/
https://www.ncbi.nlm.nih.gov/pubmed/33215068
http://dx.doi.org/10.1093/jamiaopen/ooaa042
_version_ 1783609121556135936
author Smalheiser, Neil R
Holt, Arthur W
author_facet Smalheiser, Neil R
Holt, Arthur W
author_sort Smalheiser, Neil R
collection PubMed
description OBJECTIVES: To identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence. MATERIALS AND METHODS: We updated our previous model by creating larger, more recent, and more diverse positive and negative training sets consisting of article pairs that were (or not) linked to the same ClinicalTrials.gov trial registry number. Features were extracted from PubMed metadata; pairwise similarity scores were modeled using logistic regression and used to form clusters of articles that are likely to arise from the same registered clinical trial. RESULTS: Articles from the same trial were identified with high accuracy (F1 = 0.859), nominally better than the previous model (F1 = 0.843). Predicted clusters showed a low error rate of splitting of 8–11% (ie, when 2 articles belonged to the same trial but were assigned to different clusters). Performance was similar whether only randomized controlled trial articles or a more diverse set of clinical trial articles were processed. DISCUSSION: Metadata are surprisingly accurate in predicting when 2 articles derive from the same underlying clinical trial. CONCLUSION: We have continued confidence in the Aggregator tool which can be accessed publicly at http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi.
format Online
Article
Text
id pubmed-7660960
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-76609602020-11-18 New improved Aggregator: predicting which clinical trial articles derive from the same registered clinical trial Smalheiser, Neil R Holt, Arthur W JAMIA Open Application Notes OBJECTIVES: To identify separate publications that report outcomes from the same underlying clinical trial, in order to avoid over-counting these as independent pieces of evidence. MATERIALS AND METHODS: We updated our previous model by creating larger, more recent, and more diverse positive and negative training sets consisting of article pairs that were (or not) linked to the same ClinicalTrials.gov trial registry number. Features were extracted from PubMed metadata; pairwise similarity scores were modeled using logistic regression and used to form clusters of articles that are likely to arise from the same registered clinical trial. RESULTS: Articles from the same trial were identified with high accuracy (F1 = 0.859), nominally better than the previous model (F1 = 0.843). Predicted clusters showed a low error rate of splitting of 8–11% (ie, when 2 articles belonged to the same trial but were assigned to different clusters). Performance was similar whether only randomized controlled trial articles or a more diverse set of clinical trial articles were processed. DISCUSSION: Metadata are surprisingly accurate in predicting when 2 articles derive from the same underlying clinical trial. CONCLUSION: We have continued confidence in the Aggregator tool which can be accessed publicly at http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi. Oxford University Press 2020-10-28 /pmc/articles/PMC7660960/ /pubmed/33215068 http://dx.doi.org/10.1093/jamiaopen/ooaa042 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Application Notes
Smalheiser, Neil R
Holt, Arthur W
New improved Aggregator: predicting which clinical trial articles derive from the same registered clinical trial
title New improved Aggregator: predicting which clinical trial articles derive from the same registered clinical trial
title_full New improved Aggregator: predicting which clinical trial articles derive from the same registered clinical trial
title_fullStr New improved Aggregator: predicting which clinical trial articles derive from the same registered clinical trial
title_full_unstemmed New improved Aggregator: predicting which clinical trial articles derive from the same registered clinical trial
title_short New improved Aggregator: predicting which clinical trial articles derive from the same registered clinical trial
title_sort new improved aggregator: predicting which clinical trial articles derive from the same registered clinical trial
topic Application Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7660960/
https://www.ncbi.nlm.nih.gov/pubmed/33215068
http://dx.doi.org/10.1093/jamiaopen/ooaa042
work_keys_str_mv AT smalheiserneilr newimprovedaggregatorpredictingwhichclinicaltrialarticlesderivefromthesameregisteredclinicaltrial
AT holtarthurw newimprovedaggregatorpredictingwhichclinicaltrialarticlesderivefromthesameregisteredclinicaltrial