Cargando…

Semi-Automatic Knowledge Extraction from COVID-19 Scientific Literature: the COKE Project: Davide Golinelli

BACKGROUND: The COVID-19 pandemic highlighted the importance of rapidly updating scientific information. However, the guidelines’ drafting process is highly time- and resource-consuming. The COKE Project aims to accelerate and streamline the extraction and synthesis of scientific evidence. To do so,...

Descripción completa

Detalles Bibliográficos
Autores principales: Golinelli, D, Nuzzolese, AG, Sanmarchi, F, Bulla, L, Mongiovì, M, Gangemi, A, Rucci, P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9593420/
http://dx.doi.org/10.1093/eurpub/ckac129.643
_version_ 1784815155328057344
author Golinelli, D
Nuzzolese, AG
Sanmarchi, F
Bulla, L
Mongiovì, M
Gangemi, A
Rucci, P
author_facet Golinelli, D
Nuzzolese, AG
Sanmarchi, F
Bulla, L
Mongiovì, M
Gangemi, A
Rucci, P
author_sort Golinelli, D
collection PubMed
description BACKGROUND: The COVID-19 pandemic highlighted the importance of rapidly updating scientific information. However, the guidelines’ drafting process is highly time- and resource-consuming. The COKE Project aims to accelerate and streamline the extraction and synthesis of scientific evidence. To do so, the Project used deep learning to implement a semi-automated system that enhances the systematic literature review processes. We aim to show some preliminary results on the automatic classification of abstract sentences in papers related to COVID-19. METHODS: The tool is based on Natural Language Processing algorithms to detect and classify PICO elements and medical terms and organize abstracts accordingly. We built a BERT + bi-LSTM language model. The tool was trained on a corpus of 24,668 abstracts unrelated to COVID-19. We assessed the tool performance in a specific topic related to COVID-19 that has not been covered during training. To carry out manual validation, we randomly selected 50 abstracts. Abstract sentences were classified by 2 domain experts into 7 types: Aim (A), Participants (P), Intervention (I), Outcome (O), Method (M), Results (R), and Conclusion (C). The performance of the tool was compared with that of the experts in terms of precision, recall, and F1. RESULTS: The classifier proved to have a 76% overall accuracy. Precision, recall, and F1 were above 75% for all types of sentences except I, M, and P. CONCLUSIONS: The results indicate a promising ability of the semi-automated classifier to predict expert-validated labels on abstracts of different topics. Our proposed tool is expected to significantly reduce the effort for producing medical guidelines and therefore have a strong, positive impact, particularly in emergency scenarios. The COKE Project also represents a call-to-action for similar initiatives, aimed at enhancing the information extraction process in medicine. KEY MESSAGES: • A rapidly changing healthcare requires fast decisions supported by scientific evidence. This is not compatible with the human limits in cognitive skills that reduce the ability to extract information. • The COKE Project aims to speed up the creation of healthcare guidelines, semi-automating parts of the workflow, and supporting the human-performed process of extracting and analyzing contents.
format Online
Article
Text
id pubmed-9593420
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-95934202022-11-04 Semi-Automatic Knowledge Extraction from COVID-19 Scientific Literature: the COKE Project: Davide Golinelli Golinelli, D Nuzzolese, AG Sanmarchi, F Bulla, L Mongiovì, M Gangemi, A Rucci, P Eur J Public Health Parallel Programme BACKGROUND: The COVID-19 pandemic highlighted the importance of rapidly updating scientific information. However, the guidelines’ drafting process is highly time- and resource-consuming. The COKE Project aims to accelerate and streamline the extraction and synthesis of scientific evidence. To do so, the Project used deep learning to implement a semi-automated system that enhances the systematic literature review processes. We aim to show some preliminary results on the automatic classification of abstract sentences in papers related to COVID-19. METHODS: The tool is based on Natural Language Processing algorithms to detect and classify PICO elements and medical terms and organize abstracts accordingly. We built a BERT + bi-LSTM language model. The tool was trained on a corpus of 24,668 abstracts unrelated to COVID-19. We assessed the tool performance in a specific topic related to COVID-19 that has not been covered during training. To carry out manual validation, we randomly selected 50 abstracts. Abstract sentences were classified by 2 domain experts into 7 types: Aim (A), Participants (P), Intervention (I), Outcome (O), Method (M), Results (R), and Conclusion (C). The performance of the tool was compared with that of the experts in terms of precision, recall, and F1. RESULTS: The classifier proved to have a 76% overall accuracy. Precision, recall, and F1 were above 75% for all types of sentences except I, M, and P. CONCLUSIONS: The results indicate a promising ability of the semi-automated classifier to predict expert-validated labels on abstracts of different topics. Our proposed tool is expected to significantly reduce the effort for producing medical guidelines and therefore have a strong, positive impact, particularly in emergency scenarios. The COKE Project also represents a call-to-action for similar initiatives, aimed at enhancing the information extraction process in medicine. KEY MESSAGES: • A rapidly changing healthcare requires fast decisions supported by scientific evidence. This is not compatible with the human limits in cognitive skills that reduce the ability to extract information. • The COKE Project aims to speed up the creation of healthcare guidelines, semi-automating parts of the workflow, and supporting the human-performed process of extracting and analyzing contents. Oxford University Press 2022-10-25 /pmc/articles/PMC9593420/ http://dx.doi.org/10.1093/eurpub/ckac129.643 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the European Public Health Association. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Parallel Programme
Golinelli, D
Nuzzolese, AG
Sanmarchi, F
Bulla, L
Mongiovì, M
Gangemi, A
Rucci, P
Semi-Automatic Knowledge Extraction from COVID-19 Scientific Literature: the COKE Project: Davide Golinelli
title Semi-Automatic Knowledge Extraction from COVID-19 Scientific Literature: the COKE Project: Davide Golinelli
title_full Semi-Automatic Knowledge Extraction from COVID-19 Scientific Literature: the COKE Project: Davide Golinelli
title_fullStr Semi-Automatic Knowledge Extraction from COVID-19 Scientific Literature: the COKE Project: Davide Golinelli
title_full_unstemmed Semi-Automatic Knowledge Extraction from COVID-19 Scientific Literature: the COKE Project: Davide Golinelli
title_short Semi-Automatic Knowledge Extraction from COVID-19 Scientific Literature: the COKE Project: Davide Golinelli
title_sort semi-automatic knowledge extraction from covid-19 scientific literature: the coke project: davide golinelli
topic Parallel Programme
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9593420/
http://dx.doi.org/10.1093/eurpub/ckac129.643
work_keys_str_mv AT golinellid semiautomaticknowledgeextractionfromcovid19scientificliteraturethecokeprojectdavidegolinelli
AT nuzzoleseag semiautomaticknowledgeextractionfromcovid19scientificliteraturethecokeprojectdavidegolinelli
AT sanmarchif semiautomaticknowledgeextractionfromcovid19scientificliteraturethecokeprojectdavidegolinelli
AT bullal semiautomaticknowledgeextractionfromcovid19scientificliteraturethecokeprojectdavidegolinelli
AT mongiovim semiautomaticknowledgeextractionfromcovid19scientificliteraturethecokeprojectdavidegolinelli
AT gangemia semiautomaticknowledgeextractionfromcovid19scientificliteraturethecokeprojectdavidegolinelli
AT ruccip semiautomaticknowledgeextractionfromcovid19scientificliteraturethecokeprojectdavidegolinelli