Cargando…

Exploring optimal granularity for extractive summarization of unstructured health records: Analysis of the largest multi-institutional archive of health records in Japan

Automated summarization of clinical texts can reduce the burden of medical professionals. “Discharge summaries” are one promising application of the summarization, because they can be generated from daily inpatient records. Our preliminary experiment suggests that 20–31% of the descriptions in disch...

Descripción completa

Detalles Bibliográficos
Autores principales: Ando, Kenichiro, Okumura, Takashi, Komachi, Mamoru, Horiguchi, Hiromasa, Matsumoto, Yuji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931252/
https://www.ncbi.nlm.nih.gov/pubmed/36812582
http://dx.doi.org/10.1371/journal.pdig.0000099
_version_ 1784889208106647552
author Ando, Kenichiro
Okumura, Takashi
Komachi, Mamoru
Horiguchi, Hiromasa
Matsumoto, Yuji
author_facet Ando, Kenichiro
Okumura, Takashi
Komachi, Mamoru
Horiguchi, Hiromasa
Matsumoto, Yuji
author_sort Ando, Kenichiro
collection PubMed
description Automated summarization of clinical texts can reduce the burden of medical professionals. “Discharge summaries” are one promising application of the summarization, because they can be generated from daily inpatient records. Our preliminary experiment suggests that 20–31% of the descriptions in discharge summaries overlap with the content of the inpatient records. However, it remains unclear how the summaries should be generated from the unstructured source. To decompose the physician’s summarization process, this study aimed to identify the optimal granularity in summarization. We first defined three types of summarization units with different granularities to compare the performance of the discharge summary generation: whole sentences, clinical segments, and clauses. We defined clinical segments in this study, aiming to express the smallest medically meaningful concepts. To obtain the clinical segments, it was necessary to automatically split the texts in the first stage of the pipeline. Accordingly, we compared rule-based methods and a machine learning method, and the latter outperformed the formers with an F1 score of 0.846 in the splitting task. Next, we experimentally measured the accuracy of extractive summarization using the three types of units, based on the ROUGE-1 metric, on a multi-institutional national archive of health records in Japan. The measured accuracies of extractive summarization using whole sentences, clinical segments, and clauses were 31.91, 36.15, and 25.18, respectively. We found that the clinical segments yielded higher accuracy than sentences and clauses. This result indicates that summarization of inpatient records demands finer granularity than sentence-oriented processing. Although we used only Japanese health records, it can be interpreted as follows: physicians extract “concepts of medical significance” from patient records and recombine them in new contexts when summarizing chronological clinical records, rather than simply copying and pasting topic sentences. This observation suggests that a discharge summary is created by higher-order information processing over concepts on sub-sentence level, which may guide future research in this field.
format Online
Article
Text
id pubmed-9931252
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-99312522023-02-16 Exploring optimal granularity for extractive summarization of unstructured health records: Analysis of the largest multi-institutional archive of health records in Japan Ando, Kenichiro Okumura, Takashi Komachi, Mamoru Horiguchi, Hiromasa Matsumoto, Yuji PLOS Digit Health Research Article Automated summarization of clinical texts can reduce the burden of medical professionals. “Discharge summaries” are one promising application of the summarization, because they can be generated from daily inpatient records. Our preliminary experiment suggests that 20–31% of the descriptions in discharge summaries overlap with the content of the inpatient records. However, it remains unclear how the summaries should be generated from the unstructured source. To decompose the physician’s summarization process, this study aimed to identify the optimal granularity in summarization. We first defined three types of summarization units with different granularities to compare the performance of the discharge summary generation: whole sentences, clinical segments, and clauses. We defined clinical segments in this study, aiming to express the smallest medically meaningful concepts. To obtain the clinical segments, it was necessary to automatically split the texts in the first stage of the pipeline. Accordingly, we compared rule-based methods and a machine learning method, and the latter outperformed the formers with an F1 score of 0.846 in the splitting task. Next, we experimentally measured the accuracy of extractive summarization using the three types of units, based on the ROUGE-1 metric, on a multi-institutional national archive of health records in Japan. The measured accuracies of extractive summarization using whole sentences, clinical segments, and clauses were 31.91, 36.15, and 25.18, respectively. We found that the clinical segments yielded higher accuracy than sentences and clauses. This result indicates that summarization of inpatient records demands finer granularity than sentence-oriented processing. Although we used only Japanese health records, it can be interpreted as follows: physicians extract “concepts of medical significance” from patient records and recombine them in new contexts when summarizing chronological clinical records, rather than simply copying and pasting topic sentences. This observation suggests that a discharge summary is created by higher-order information processing over concepts on sub-sentence level, which may guide future research in this field. Public Library of Science 2022-09-15 /pmc/articles/PMC9931252/ /pubmed/36812582 http://dx.doi.org/10.1371/journal.pdig.0000099 Text en © 2022 Ando et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Ando, Kenichiro
Okumura, Takashi
Komachi, Mamoru
Horiguchi, Hiromasa
Matsumoto, Yuji
Exploring optimal granularity for extractive summarization of unstructured health records: Analysis of the largest multi-institutional archive of health records in Japan
title Exploring optimal granularity for extractive summarization of unstructured health records: Analysis of the largest multi-institutional archive of health records in Japan
title_full Exploring optimal granularity for extractive summarization of unstructured health records: Analysis of the largest multi-institutional archive of health records in Japan
title_fullStr Exploring optimal granularity for extractive summarization of unstructured health records: Analysis of the largest multi-institutional archive of health records in Japan
title_full_unstemmed Exploring optimal granularity for extractive summarization of unstructured health records: Analysis of the largest multi-institutional archive of health records in Japan
title_short Exploring optimal granularity for extractive summarization of unstructured health records: Analysis of the largest multi-institutional archive of health records in Japan
title_sort exploring optimal granularity for extractive summarization of unstructured health records: analysis of the largest multi-institutional archive of health records in japan
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9931252/
https://www.ncbi.nlm.nih.gov/pubmed/36812582
http://dx.doi.org/10.1371/journal.pdig.0000099
work_keys_str_mv AT andokenichiro exploringoptimalgranularityforextractivesummarizationofunstructuredhealthrecordsanalysisofthelargestmultiinstitutionalarchiveofhealthrecordsinjapan
AT okumuratakashi exploringoptimalgranularityforextractivesummarizationofunstructuredhealthrecordsanalysisofthelargestmultiinstitutionalarchiveofhealthrecordsinjapan
AT komachimamoru exploringoptimalgranularityforextractivesummarizationofunstructuredhealthrecordsanalysisofthelargestmultiinstitutionalarchiveofhealthrecordsinjapan
AT horiguchihiromasa exploringoptimalgranularityforextractivesummarizationofunstructuredhealthrecordsanalysisofthelargestmultiinstitutionalarchiveofhealthrecordsinjapan
AT matsumotoyuji exploringoptimalgranularityforextractivesummarizationofunstructuredhealthrecordsanalysisofthelargestmultiinstitutionalarchiveofhealthrecordsinjapan