Cargando…
Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer
BACKGROUND: Meta-analyses aggregate results of different clinical studies to assess the effectiveness of a treatment. Despite their importance, meta-analyses are time-consuming and labor-intensive as they involve reading hundreds of research articles and extracting data. The number of research artic...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9206132/ https://www.ncbi.nlm.nih.gov/pubmed/35717167 http://dx.doi.org/10.1186/s12911-022-01897-4 |
_version_ | 1784729275806515200 |
---|---|
author | Mutinda, Faith Wavinya Liew, Kongmeng Yada, Shuntaro Wakamiya, Shoko Aramaki, Eiji |
author_facet | Mutinda, Faith Wavinya Liew, Kongmeng Yada, Shuntaro Wakamiya, Shoko Aramaki, Eiji |
author_sort | Mutinda, Faith Wavinya |
collection | PubMed |
description | BACKGROUND: Meta-analyses aggregate results of different clinical studies to assess the effectiveness of a treatment. Despite their importance, meta-analyses are time-consuming and labor-intensive as they involve reading hundreds of research articles and extracting data. The number of research articles is increasing rapidly and most meta-analyses are outdated shortly after publication as new evidence has not been included. Automatic extraction of data from research articles can expedite the meta-analysis process and allow for automatic updates when new results become available. In this study, we propose a system for automatically extracting data from research abstracts and performing statistical analysis. MATERIALS AND METHODS: Our corpus consists of 1011 PubMed abstracts of breast cancer randomized controlled trials annotated with the core elements of clinical trials: Participants, Intervention, Control, and Outcomes (PICO). We proposed a BERT-based named entity recognition (NER) model to identify PICO information from research abstracts. After extracting the PICO information, we parse numeric outcomes to identify the number of patients having certain outcomes for statistical analysis. RESULTS: The NER model extracted PICO elements with relatively high accuracy, achieving F1-scores greater than 0.80 in most entities. We assessed the performance of the proposed system by reproducing the results of an existing meta-analysis. The data extraction step achieved high accuracy, however the statistical analysis step achieved low performance because abstracts sometimes lack all the required information. CONCLUSION: We proposed a system for automatically extracting data from research abstracts and performing statistical analysis. We evaluated the performance of the system by reproducing an existing meta-analysis and the system achieved a relatively good performance, though more substantiation is required. |
format | Online Article Text |
id | pubmed-9206132 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-92061322022-06-19 Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer Mutinda, Faith Wavinya Liew, Kongmeng Yada, Shuntaro Wakamiya, Shoko Aramaki, Eiji BMC Med Inform Decis Mak Research BACKGROUND: Meta-analyses aggregate results of different clinical studies to assess the effectiveness of a treatment. Despite their importance, meta-analyses are time-consuming and labor-intensive as they involve reading hundreds of research articles and extracting data. The number of research articles is increasing rapidly and most meta-analyses are outdated shortly after publication as new evidence has not been included. Automatic extraction of data from research articles can expedite the meta-analysis process and allow for automatic updates when new results become available. In this study, we propose a system for automatically extracting data from research abstracts and performing statistical analysis. MATERIALS AND METHODS: Our corpus consists of 1011 PubMed abstracts of breast cancer randomized controlled trials annotated with the core elements of clinical trials: Participants, Intervention, Control, and Outcomes (PICO). We proposed a BERT-based named entity recognition (NER) model to identify PICO information from research abstracts. After extracting the PICO information, we parse numeric outcomes to identify the number of patients having certain outcomes for statistical analysis. RESULTS: The NER model extracted PICO elements with relatively high accuracy, achieving F1-scores greater than 0.80 in most entities. We assessed the performance of the proposed system by reproducing the results of an existing meta-analysis. The data extraction step achieved high accuracy, however the statistical analysis step achieved low performance because abstracts sometimes lack all the required information. CONCLUSION: We proposed a system for automatically extracting data from research abstracts and performing statistical analysis. We evaluated the performance of the system by reproducing an existing meta-analysis and the system achieved a relatively good performance, though more substantiation is required. BioMed Central 2022-06-18 /pmc/articles/PMC9206132/ /pubmed/35717167 http://dx.doi.org/10.1186/s12911-022-01897-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Mutinda, Faith Wavinya Liew, Kongmeng Yada, Shuntaro Wakamiya, Shoko Aramaki, Eiji Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer |
title | Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer |
title_full | Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer |
title_fullStr | Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer |
title_full_unstemmed | Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer |
title_short | Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer |
title_sort | automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9206132/ https://www.ncbi.nlm.nih.gov/pubmed/35717167 http://dx.doi.org/10.1186/s12911-022-01897-4 |
work_keys_str_mv | AT mutindafaithwavinya automaticdataextractiontosupportmetaanalysisstatisticalanalysisacasestudyonbreastcancer AT liewkongmeng automaticdataextractiontosupportmetaanalysisstatisticalanalysisacasestudyonbreastcancer AT yadashuntaro automaticdataextractiontosupportmetaanalysisstatisticalanalysisacasestudyonbreastcancer AT wakamiyashoko automaticdataextractiontosupportmetaanalysisstatisticalanalysisacasestudyonbreastcancer AT aramakieiji automaticdataextractiontosupportmetaanalysisstatisticalanalysisacasestudyonbreastcancer |