Cargando…

Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach

MOTIVATION: Automated extraction of population, intervention, comparison/control, and outcome (PICO) from the randomized controlled trial (RCT) abstracts is important for evidence synthesis. Previous studies have demonstrated the feasibility of applying natural language processing (NLP) for PICO ext...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hu, Yan, Keloth, Vipina K, Raja, Kalpana, Chen, Yong, Xu, Hua
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10500081/ https://www.ncbi.nlm.nih.gov/pubmed/37669123 http://dx.doi.org/10.1093/bioinformatics/btad542

_version_	1785105848509399040
author	Hu, Yan Keloth, Vipina K Raja, Kalpana Chen, Yong Xu, Hua
author_facet	Hu, Yan Keloth, Vipina K Raja, Kalpana Chen, Yong Xu, Hua
author_sort	Hu, Yan
collection	PubMed
description	MOTIVATION: Automated extraction of population, intervention, comparison/control, and outcome (PICO) from the randomized controlled trial (RCT) abstracts is important for evidence synthesis. Previous studies have demonstrated the feasibility of applying natural language processing (NLP) for PICO extraction. However, the performance is not optimal due to the complexity of PICO information in RCT abstracts and the challenges involved in their annotation. RESULTS: We propose a two-step NLP pipeline to extract PICO elements from RCT abstracts: (i) sentence classification using a prompt-based learning model and (ii) PICO extraction using a named entity recognition (NER) model. First, the sentences in abstracts were categorized into four sections namely background, methods, results, and conclusions. Next, the NER model was applied to extract the PICO elements from the sentences within the title and methods sections that include >96% of PICO information. We evaluated our proposed NLP pipeline on three datasets, the EBM-NLP(mod) dataset, a randomly selected and re-annotated dataset of 500 RCT abstracts from the EBM-NLP corpus, a dataset of 150 Coronavirus Disease 2019 (COVID-19) RCT abstracts, and a dataset of 150 Alzheimer’s disease (AD) RCT abstracts. The end-to-end evaluation reveals that our proposed approach achieved an overall micro F1 score of 0.833 on the EBM-NLP(mod) dataset, 0.928 on the COVID-19 dataset, and 0.899 on the AD dataset when measured at the token-level and an overall micro F1 score of 0.712 on EBM-NLP(mod) dataset, 0.850 on the COVID-19 dataset, and 0.805 on the AD dataset when measured at the entity-level. AVAILABILITY AND IMPLEMENTATION: Our codes and datasets are publicly available at https://github.com/BIDS-Xu-Lab/section_specific_annotation_of_PICO.
format	Online Article Text
id	pubmed-10500081
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-105000812023-09-15 Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach Hu, Yan Keloth, Vipina K Raja, Kalpana Chen, Yong Xu, Hua Bioinformatics Original Paper MOTIVATION: Automated extraction of population, intervention, comparison/control, and outcome (PICO) from the randomized controlled trial (RCT) abstracts is important for evidence synthesis. Previous studies have demonstrated the feasibility of applying natural language processing (NLP) for PICO extraction. However, the performance is not optimal due to the complexity of PICO information in RCT abstracts and the challenges involved in their annotation. RESULTS: We propose a two-step NLP pipeline to extract PICO elements from RCT abstracts: (i) sentence classification using a prompt-based learning model and (ii) PICO extraction using a named entity recognition (NER) model. First, the sentences in abstracts were categorized into four sections namely background, methods, results, and conclusions. Next, the NER model was applied to extract the PICO elements from the sentences within the title and methods sections that include >96% of PICO information. We evaluated our proposed NLP pipeline on three datasets, the EBM-NLP(mod) dataset, a randomly selected and re-annotated dataset of 500 RCT abstracts from the EBM-NLP corpus, a dataset of 150 Coronavirus Disease 2019 (COVID-19) RCT abstracts, and a dataset of 150 Alzheimer’s disease (AD) RCT abstracts. The end-to-end evaluation reveals that our proposed approach achieved an overall micro F1 score of 0.833 on the EBM-NLP(mod) dataset, 0.928 on the COVID-19 dataset, and 0.899 on the AD dataset when measured at the token-level and an overall micro F1 score of 0.712 on EBM-NLP(mod) dataset, 0.850 on the COVID-19 dataset, and 0.805 on the AD dataset when measured at the entity-level. AVAILABILITY AND IMPLEMENTATION: Our codes and datasets are publicly available at https://github.com/BIDS-Xu-Lab/section_specific_annotation_of_PICO. Oxford University Press 2023-09-05 /pmc/articles/PMC10500081/ /pubmed/37669123 http://dx.doi.org/10.1093/bioinformatics/btad542 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Paper Hu, Yan Keloth, Vipina K Raja, Kalpana Chen, Yong Xu, Hua Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach
title	Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach
title_full	Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach
title_fullStr	Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach
title_full_unstemmed	Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach
title_short	Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach
title_sort	towards precise pico extraction from abstracts of randomized controlled trials using a section-specific learning approach
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10500081/ https://www.ncbi.nlm.nih.gov/pubmed/37669123 http://dx.doi.org/10.1093/bioinformatics/btad542
work_keys_str_mv	AT huyan towardsprecisepicoextractionfromabstractsofrandomizedcontrolledtrialsusingasectionspecificlearningapproach AT kelothvipinak towardsprecisepicoextractionfromabstractsofrandomizedcontrolledtrialsusingasectionspecificlearningapproach AT rajakalpana towardsprecisepicoextractionfromabstractsofrandomizedcontrolledtrialsusingasectionspecificlearningapproach AT chenyong towardsprecisepicoextractionfromabstractsofrandomizedcontrolledtrialsusingasectionspecificlearningapproach AT xuhua towardsprecisepicoextractionfromabstractsofrandomizedcontrolledtrialsusingasectionspecificlearningapproach

Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach

Ejemplares similares