Cargando…

‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers

INTRODUCTION: The number of new biomedical manuscripts published on important topics exceeds the capacity of single persons to read. Integration of literature is an even more elusive task. This article describes a pilot study of a scalable online system to integrate data from 1000 articles on COVID-...

Descripción completa

Detalles Bibliográficos
Autores principales: Lunna, Shania, Flinn, Isabelle, Prytherch, James, Torfs-Leibman, Camille, Robtoy, Sarah, Bansak, Matt, Krag, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8914395/
https://www.ncbi.nlm.nih.gov/pubmed/35264375
http://dx.doi.org/10.1136/bmjhci-2021-100452
_version_ 1784667698351833088
author Lunna, Shania
Flinn, Isabelle
Prytherch, James
Torfs-Leibman, Camille
Robtoy, Sarah
Bansak, Matt
Krag, David
author_facet Lunna, Shania
Flinn, Isabelle
Prytherch, James
Torfs-Leibman, Camille
Robtoy, Sarah
Bansak, Matt
Krag, David
author_sort Lunna, Shania
collection PubMed
description INTRODUCTION: The number of new biomedical manuscripts published on important topics exceeds the capacity of single persons to read. Integration of literature is an even more elusive task. This article describes a pilot study of a scalable online system to integrate data from 1000 articles on COVID-19. METHODS: Articles were imported from PubMed using the query ‘COVID-19’. The full text of articles reporting new data was obtained and the results extracted manually. An online software system was used to enter the results. Similar results were bundled using note fields in parent–child order. Each extracted result was linked to the source article. Each new data entry comprised at least four note fields: (1) result, (2) population or sample, (3) description of the result and (4) topic. Articles underwent iterative rounds of group review over remote sessions. RESULTS: Screening 4126 COVID-19 articles resulted in a selection of 1000 publications presenting new data. The results were extracted and manually entered in note fields. Integration from multiple publications was achieved by sharing parent note fields by child entries. The total number of extracted primary results was 12 209. The mean number of results per article was 15.1 (SD 12.0). The average number of parent note fields for each result note field was 6.8 (SD 1.4). The total number of all note fields was 28 809. Without sharing of parent note fields, there would have been a total of 94 986 note fields. CONCLUSION: This pilot study demonstrates the feasibility of a scalable online system to extract results from 1000 manuscripts. Using four types of notes to describe each result provided standardisation of data entry and information integration. There was substantial reduction in complexity and reduction in total note fields by sharing of parent note fields. We conclude that this system provides a method to scale up extraction of information on very large topics.
format Online
Article
Text
id pubmed-8914395
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-89143952022-03-11 ‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers Lunna, Shania Flinn, Isabelle Prytherch, James Torfs-Leibman, Camille Robtoy, Sarah Bansak, Matt Krag, David BMJ Health Care Inform Short Report INTRODUCTION: The number of new biomedical manuscripts published on important topics exceeds the capacity of single persons to read. Integration of literature is an even more elusive task. This article describes a pilot study of a scalable online system to integrate data from 1000 articles on COVID-19. METHODS: Articles were imported from PubMed using the query ‘COVID-19’. The full text of articles reporting new data was obtained and the results extracted manually. An online software system was used to enter the results. Similar results were bundled using note fields in parent–child order. Each extracted result was linked to the source article. Each new data entry comprised at least four note fields: (1) result, (2) population or sample, (3) description of the result and (4) topic. Articles underwent iterative rounds of group review over remote sessions. RESULTS: Screening 4126 COVID-19 articles resulted in a selection of 1000 publications presenting new data. The results were extracted and manually entered in note fields. Integration from multiple publications was achieved by sharing parent note fields by child entries. The total number of extracted primary results was 12 209. The mean number of results per article was 15.1 (SD 12.0). The average number of parent note fields for each result note field was 6.8 (SD 1.4). The total number of all note fields was 28 809. Without sharing of parent note fields, there would have been a total of 94 986 note fields. CONCLUSION: This pilot study demonstrates the feasibility of a scalable online system to extract results from 1000 manuscripts. Using four types of notes to describe each result provided standardisation of data entry and information integration. There was substantial reduction in complexity and reduction in total note fields by sharing of parent note fields. We conclude that this system provides a method to scale up extraction of information on very large topics. BMJ Publishing Group 2022-03-09 /pmc/articles/PMC8914395/ /pubmed/35264375 http://dx.doi.org/10.1136/bmjhci-2021-100452 Text en © Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Short Report
Lunna, Shania
Flinn, Isabelle
Prytherch, James
Torfs-Leibman, Camille
Robtoy, Sarah
Bansak, Matt
Krag, David
‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers
title ‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers
title_full ‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers
title_fullStr ‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers
title_full_unstemmed ‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers
title_short ‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers
title_sort ‘refbin’ an online platform to extract and classify large-scale information: a pilot study of covid-19 related papers
topic Short Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8914395/
https://www.ncbi.nlm.nih.gov/pubmed/35264375
http://dx.doi.org/10.1136/bmjhci-2021-100452
work_keys_str_mv AT lunnashania refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers
AT flinnisabelle refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers
AT prytherchjames refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers
AT torfsleibmancamille refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers
AT robtoysarah refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers
AT bansakmatt refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers
AT kragdavid refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers