Cargando…
‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers
INTRODUCTION: The number of new biomedical manuscripts published on important topics exceeds the capacity of single persons to read. Integration of literature is an even more elusive task. This article describes a pilot study of a scalable online system to integrate data from 1000 articles on COVID-...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BMJ Publishing Group
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8914395/ https://www.ncbi.nlm.nih.gov/pubmed/35264375 http://dx.doi.org/10.1136/bmjhci-2021-100452 |
_version_ | 1784667698351833088 |
---|---|
author | Lunna, Shania Flinn, Isabelle Prytherch, James Torfs-Leibman, Camille Robtoy, Sarah Bansak, Matt Krag, David |
author_facet | Lunna, Shania Flinn, Isabelle Prytherch, James Torfs-Leibman, Camille Robtoy, Sarah Bansak, Matt Krag, David |
author_sort | Lunna, Shania |
collection | PubMed |
description | INTRODUCTION: The number of new biomedical manuscripts published on important topics exceeds the capacity of single persons to read. Integration of literature is an even more elusive task. This article describes a pilot study of a scalable online system to integrate data from 1000 articles on COVID-19. METHODS: Articles were imported from PubMed using the query ‘COVID-19’. The full text of articles reporting new data was obtained and the results extracted manually. An online software system was used to enter the results. Similar results were bundled using note fields in parent–child order. Each extracted result was linked to the source article. Each new data entry comprised at least four note fields: (1) result, (2) population or sample, (3) description of the result and (4) topic. Articles underwent iterative rounds of group review over remote sessions. RESULTS: Screening 4126 COVID-19 articles resulted in a selection of 1000 publications presenting new data. The results were extracted and manually entered in note fields. Integration from multiple publications was achieved by sharing parent note fields by child entries. The total number of extracted primary results was 12 209. The mean number of results per article was 15.1 (SD 12.0). The average number of parent note fields for each result note field was 6.8 (SD 1.4). The total number of all note fields was 28 809. Without sharing of parent note fields, there would have been a total of 94 986 note fields. CONCLUSION: This pilot study demonstrates the feasibility of a scalable online system to extract results from 1000 manuscripts. Using four types of notes to describe each result provided standardisation of data entry and information integration. There was substantial reduction in complexity and reduction in total note fields by sharing of parent note fields. We conclude that this system provides a method to scale up extraction of information on very large topics. |
format | Online Article Text |
id | pubmed-8914395 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BMJ Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-89143952022-03-11 ‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers Lunna, Shania Flinn, Isabelle Prytherch, James Torfs-Leibman, Camille Robtoy, Sarah Bansak, Matt Krag, David BMJ Health Care Inform Short Report INTRODUCTION: The number of new biomedical manuscripts published on important topics exceeds the capacity of single persons to read. Integration of literature is an even more elusive task. This article describes a pilot study of a scalable online system to integrate data from 1000 articles on COVID-19. METHODS: Articles were imported from PubMed using the query ‘COVID-19’. The full text of articles reporting new data was obtained and the results extracted manually. An online software system was used to enter the results. Similar results were bundled using note fields in parent–child order. Each extracted result was linked to the source article. Each new data entry comprised at least four note fields: (1) result, (2) population or sample, (3) description of the result and (4) topic. Articles underwent iterative rounds of group review over remote sessions. RESULTS: Screening 4126 COVID-19 articles resulted in a selection of 1000 publications presenting new data. The results were extracted and manually entered in note fields. Integration from multiple publications was achieved by sharing parent note fields by child entries. The total number of extracted primary results was 12 209. The mean number of results per article was 15.1 (SD 12.0). The average number of parent note fields for each result note field was 6.8 (SD 1.4). The total number of all note fields was 28 809. Without sharing of parent note fields, there would have been a total of 94 986 note fields. CONCLUSION: This pilot study demonstrates the feasibility of a scalable online system to extract results from 1000 manuscripts. Using four types of notes to describe each result provided standardisation of data entry and information integration. There was substantial reduction in complexity and reduction in total note fields by sharing of parent note fields. We conclude that this system provides a method to scale up extraction of information on very large topics. BMJ Publishing Group 2022-03-09 /pmc/articles/PMC8914395/ /pubmed/35264375 http://dx.doi.org/10.1136/bmjhci-2021-100452 Text en © Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) . |
spellingShingle | Short Report Lunna, Shania Flinn, Isabelle Prytherch, James Torfs-Leibman, Camille Robtoy, Sarah Bansak, Matt Krag, David ‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers |
title | ‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers |
title_full | ‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers |
title_fullStr | ‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers |
title_full_unstemmed | ‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers |
title_short | ‘Refbin’ an online platform to extract and classify large-scale information: a pilot study of COVID-19 related papers |
title_sort | ‘refbin’ an online platform to extract and classify large-scale information: a pilot study of covid-19 related papers |
topic | Short Report |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8914395/ https://www.ncbi.nlm.nih.gov/pubmed/35264375 http://dx.doi.org/10.1136/bmjhci-2021-100452 |
work_keys_str_mv | AT lunnashania refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers AT flinnisabelle refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers AT prytherchjames refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers AT torfsleibmancamille refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers AT robtoysarah refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers AT bansakmatt refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers AT kragdavid refbinanonlineplatformtoextractandclassifylargescaleinformationapilotstudyofcovid19relatedpapers |