Cargando…

Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer

BACKGROUND: Meta-analyses aggregate results of different clinical studies to assess the effectiveness of a treatment. Despite their importance, meta-analyses are time-consuming and labor-intensive as they involve reading hundreds of research articles and extracting data. The number of research artic...

Descripción completa

Detalles Bibliográficos
Autores principales: Mutinda, Faith Wavinya, Liew, Kongmeng, Yada, Shuntaro, Wakamiya, Shoko, Aramaki, Eiji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9206132/
https://www.ncbi.nlm.nih.gov/pubmed/35717167
http://dx.doi.org/10.1186/s12911-022-01897-4
_version_ 1784729275806515200
author Mutinda, Faith Wavinya
Liew, Kongmeng
Yada, Shuntaro
Wakamiya, Shoko
Aramaki, Eiji
author_facet Mutinda, Faith Wavinya
Liew, Kongmeng
Yada, Shuntaro
Wakamiya, Shoko
Aramaki, Eiji
author_sort Mutinda, Faith Wavinya
collection PubMed
description BACKGROUND: Meta-analyses aggregate results of different clinical studies to assess the effectiveness of a treatment. Despite their importance, meta-analyses are time-consuming and labor-intensive as they involve reading hundreds of research articles and extracting data. The number of research articles is increasing rapidly and most meta-analyses are outdated shortly after publication as new evidence has not been included. Automatic extraction of data from research articles can expedite the meta-analysis process and allow for automatic updates when new results become available. In this study, we propose a system for automatically extracting data from research abstracts and performing statistical analysis. MATERIALS AND METHODS: Our corpus consists of 1011 PubMed abstracts of breast cancer randomized controlled trials annotated with the core elements of clinical trials: Participants, Intervention, Control, and Outcomes (PICO). We proposed a BERT-based named entity recognition (NER) model to identify PICO information from research abstracts. After extracting the PICO information, we parse numeric outcomes to identify the number of patients having certain outcomes for statistical analysis. RESULTS: The NER model extracted PICO elements with relatively high accuracy, achieving F1-scores greater than 0.80 in most entities. We assessed the performance of the proposed system by reproducing the results of an existing meta-analysis. The data extraction step achieved high accuracy, however the statistical analysis step achieved low performance because abstracts sometimes lack all the required information. CONCLUSION: We proposed a system for automatically extracting data from research abstracts and performing statistical analysis. We evaluated the performance of the system by reproducing an existing meta-analysis and the system achieved a relatively good performance, though more substantiation is required.
format Online
Article
Text
id pubmed-9206132
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-92061322022-06-19 Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer Mutinda, Faith Wavinya Liew, Kongmeng Yada, Shuntaro Wakamiya, Shoko Aramaki, Eiji BMC Med Inform Decis Mak Research BACKGROUND: Meta-analyses aggregate results of different clinical studies to assess the effectiveness of a treatment. Despite their importance, meta-analyses are time-consuming and labor-intensive as they involve reading hundreds of research articles and extracting data. The number of research articles is increasing rapidly and most meta-analyses are outdated shortly after publication as new evidence has not been included. Automatic extraction of data from research articles can expedite the meta-analysis process and allow for automatic updates when new results become available. In this study, we propose a system for automatically extracting data from research abstracts and performing statistical analysis. MATERIALS AND METHODS: Our corpus consists of 1011 PubMed abstracts of breast cancer randomized controlled trials annotated with the core elements of clinical trials: Participants, Intervention, Control, and Outcomes (PICO). We proposed a BERT-based named entity recognition (NER) model to identify PICO information from research abstracts. After extracting the PICO information, we parse numeric outcomes to identify the number of patients having certain outcomes for statistical analysis. RESULTS: The NER model extracted PICO elements with relatively high accuracy, achieving F1-scores greater than 0.80 in most entities. We assessed the performance of the proposed system by reproducing the results of an existing meta-analysis. The data extraction step achieved high accuracy, however the statistical analysis step achieved low performance because abstracts sometimes lack all the required information. CONCLUSION: We proposed a system for automatically extracting data from research abstracts and performing statistical analysis. We evaluated the performance of the system by reproducing an existing meta-analysis and the system achieved a relatively good performance, though more substantiation is required. BioMed Central 2022-06-18 /pmc/articles/PMC9206132/ /pubmed/35717167 http://dx.doi.org/10.1186/s12911-022-01897-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Mutinda, Faith Wavinya
Liew, Kongmeng
Yada, Shuntaro
Wakamiya, Shoko
Aramaki, Eiji
Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer
title Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer
title_full Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer
title_fullStr Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer
title_full_unstemmed Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer
title_short Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer
title_sort automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9206132/
https://www.ncbi.nlm.nih.gov/pubmed/35717167
http://dx.doi.org/10.1186/s12911-022-01897-4
work_keys_str_mv AT mutindafaithwavinya automaticdataextractiontosupportmetaanalysisstatisticalanalysisacasestudyonbreastcancer
AT liewkongmeng automaticdataextractiontosupportmetaanalysisstatisticalanalysisacasestudyonbreastcancer
AT yadashuntaro automaticdataextractiontosupportmetaanalysisstatisticalanalysisacasestudyonbreastcancer
AT wakamiyashoko automaticdataextractiontosupportmetaanalysisstatisticalanalysisacasestudyonbreastcancer
AT aramakieiji automaticdataextractiontosupportmetaanalysisstatisticalanalysisacasestudyonbreastcancer