Cargando…

‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers

INTRODUCTION: The number of new biomedical manuscripts published on important topics exceeds the capacity of single persons to read. Integration of literature is an even more elusive task. This article describes a pilot study of a scalable online system to integrate data from 1000 articles on COVID-...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lunna, Shania, Flinn, Isabelle, Prytherch, James, Torfs-Leibman, Camille, Robtoy, Sarah, Bansak, Matt, Krag, David
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BMJ Publishing Group 2022
Materias:	Short Report
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8914395/ https://www.ncbi.nlm.nih.gov/pubmed/35264375 http://dx.doi.org/10.1136/bmjhci-2021-100452

_version_	1784667698351833088
author	Lunna, Shania Flinn, Isabelle Prytherch, James Torfs-Leibman, Camille Robtoy, Sarah Bansak, Matt Krag, David
author_facet	Lunna, Shania Flinn, Isabelle Prytherch, James Torfs-Leibman, Camille Robtoy, Sarah Bansak, Matt Krag, David
author_sort	Lunna, Shania
collection	PubMed
description	INTRODUCTION: The number of new biomedical manuscripts published on important topics exceeds the capacity of single persons to read. Integration of literature is an even more elusive task. This article describes a pilot study of a scalable online system to integrate data from 1000 articles on COVID-19. METHODS: Articles were imported from PubMed using the query ‘COVID-19’. The full text of articles reporting new data was obtained and the results extracted manually. An online software system was used to enter the results. Similar results were bundled using note fields in parent–child order. Each extracted result was linked to the source article. Each new data entry comprised at least four note fields: (1) result, (2) population or sample, (3) description of the result and (4) topic. Articles underwent iterative rounds of group review over remote sessions. RESULTS: Screening 4126 COVID-19 articles resulted in a selection of 1000 publications presenting new data. The results were extracted and manually entered in note fields. Integration from multiple publications was achieved by sharing parent note fields by child entries. The total number of extracted primary results was 12 209. The mean number of results per article was 15.1 (SD 12.0). The average number of parent note fields for each result note field was 6.8 (SD 1.4). The total number of all note fields was 28 809. Without sharing of parent note fields, there would have been a total of 94 986 note fields. CONCLUSION: This pilot study demonstrates the feasibility of a scalable online system to extract results from 1000 manuscripts. Using four types of notes to describe each result provided standardisation of data entry and information integration. There was substantial reduction in complexity and reduction in total note fields by sharing of parent note fields. We conclude that this system provides a method to scale up extraction of information on very large topics.
format	Online Article Text
id	pubmed-8914395
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BMJ Publishing Group
record_format	MEDLINE/PubMed
spelling	pubmed-89143952022-03-11 ‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers Lunna, Shania Flinn, Isabelle Prytherch, James Torfs-Leibman, Camille Robtoy, Sarah Bansak, Matt Krag, David BMJ Health Care Inform Short Report INTRODUCTION: The number of new biomedical manuscripts published on important topics exceeds the capacity of single persons to read. Integration of literature is an even more elusive task. This article describes a pilot study of a scalable online system to integrate data from 1000 articles on COVID-19. METHODS: Articles were imported from PubMed using the query ‘COVID-19’. The full text of articles reporting new data was obtained and the results extracted manually. An online software system was used to enter the results. Similar results were bundled using note fields in parent–child order. Each extracted result was linked to the source article. Each new data entry comprised at least four note fields: (1) result, (2) population or sample, (3) description of the result and (4) topic. Articles underwent iterative rounds of group review over remote sessions. RESULTS: Screening 4126 COVID-19 articles resulted in a selection of 1000 publications presenting new data. The results were extracted and manually entered in note fields. Integration from multiple publications was achieved by sharing parent note fields by child entries. The total number of extracted primary results was 12 209. The mean number of results per article was 15.1 (SD 12.0). The average number of parent note fields for each result note field was 6.8 (SD 1.4). The total number of all note fields was 28 809. Without sharing of parent note fields, there would have been a total of 94 986 note fields. CONCLUSION: This pilot study demonstrates the feasibility of a scalable online system to extract results from 1000 manuscripts. Using four types of notes to describe each result provided standardisation of data entry and information integration. There was substantial reduction in complexity and reduction in total note fields by sharing of parent note fields. We conclude that this system provides a method to scale up extraction of information on very large topics. BMJ Publishing Group 2022-03-09 /pmc/articles/PMC8914395/ /pubmed/35264375 http://dx.doi.org/10.1136/bmjhci-2021-100452 Text en © Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle	Short Report Lunna, Shania Flinn, Isabelle Prytherch, James Torfs-Leibman, Camille Robtoy, Sarah Bansak, Matt Krag, David ‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers
title	‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers
title_full	‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers
title_fullStr	‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers
title_full_unstemmed	‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers
title_short	‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers
title_sort	‘refbin’ an online platform to extract and classify large-scale information: a pilot study of covid-19 related papers
topic	Short Report
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8914395/ https://www.ncbi.nlm.nih.gov/pubmed/35264375 http://dx.doi.org/10.1136/bmjhci-2021-100452
work_keys_str_mv	AT lunnashania refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers AT flinnisabelle refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers AT prytherchjames refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers AT torfsleibmancamille refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers AT robtoysarah refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers AT bansakmatt refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers AT kragdavid refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers

‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers

Ejemplares similares