Cargando…
Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction
BACKGROUND: Knowledge graphs are a powerful tool for organizing knowledge, processing information and integrating scattered information, effectively visualizing the relationships among entities and supporting further intelligent applications. One of the critical tasks in building knowledge graphs is...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
AME Publishing Company
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10240026/ https://www.ncbi.nlm.nih.gov/pubmed/37284084 http://dx.doi.org/10.21037/qims-22-1158 |
_version_ | 1785053648754049024 |
---|---|
author | Liu, Feifei Liu, Mingtong Li, Meiting Xin, Yuwei Gao, Dongping Wu, Jun Zhu, Jiaan |
author_facet | Liu, Feifei Liu, Mingtong Li, Meiting Xin, Yuwei Gao, Dongping Wu, Jun Zhu, Jiaan |
author_sort | Liu, Feifei |
collection | PubMed |
description | BACKGROUND: Knowledge graphs are a powerful tool for organizing knowledge, processing information and integrating scattered information, effectively visualizing the relationships among entities and supporting further intelligent applications. One of the critical tasks in building knowledge graphs is knowledge extraction. The existing knowledge extraction models in the Chinese medical domain usually require high-quality and large-scale manually labeled corpora for model training. In this study, we investigate rheumatoid arthritis (RA)-related Chinese electronic medical records (CEMRs) and address the automatic knowledge extraction task with a small number of annotated samples from CEMRs, from which an authoritative RA knowledge graph is constructed. METHODS: After constructing the domain ontology of RA and completing manual labeling, we propose the MC-bidirectional encoder representation from transformers-bidirectional long short-term memory-conditional random field (BERT-BiLSTM-CRF) model for the named entity recognition (NER) task and the MC-BERT + feedforward neural network (FFNN) model for the entity extraction task. The pretrained language model (MC-BERT) is trained with many unlabeled medical data and fine-tuned using other medical domain datasets. We apply the established model to automatically label the remaining CEMRs, and then an RA knowledge graph is constructed based on the entities and entity relations, a preliminary assessment is conducted, and an intelligent application is presented. RESULTS: The proposed model achieved better performance than that of other widely used models in knowledge extraction tasks, with mean F1 scores of 92.96% in entity recognition and 95.29% in relation extraction. This study preliminarily confirmed that using a pretrained medical language model could solve the problem that knowledge extraction from CEMRs requires a large number of manual annotations. An RA knowledge graph based on the above identified entities and extracted relations from 1,986 CEMRs was constructed. Experts verified the effectiveness of the constructed RA knowledge graph. CONCLUSIONS: In this paper, an RA knowledge graph based on CEMRs was established, the processes of data annotation, automatic knowledge extraction, and knowledge graph construction were described, and a preliminary assessment and an application were presented. The study demonstrated the viability of a pretrained language model combined with a deep neural network for knowledge extraction tasks from CEMRs based on a small number of manually annotated samples. |
format | Online Article Text |
id | pubmed-10240026 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | AME Publishing Company |
record_format | MEDLINE/PubMed |
spelling | pubmed-102400262023-06-06 Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction Liu, Feifei Liu, Mingtong Li, Meiting Xin, Yuwei Gao, Dongping Wu, Jun Zhu, Jiaan Quant Imaging Med Surg Original Article BACKGROUND: Knowledge graphs are a powerful tool for organizing knowledge, processing information and integrating scattered information, effectively visualizing the relationships among entities and supporting further intelligent applications. One of the critical tasks in building knowledge graphs is knowledge extraction. The existing knowledge extraction models in the Chinese medical domain usually require high-quality and large-scale manually labeled corpora for model training. In this study, we investigate rheumatoid arthritis (RA)-related Chinese electronic medical records (CEMRs) and address the automatic knowledge extraction task with a small number of annotated samples from CEMRs, from which an authoritative RA knowledge graph is constructed. METHODS: After constructing the domain ontology of RA and completing manual labeling, we propose the MC-bidirectional encoder representation from transformers-bidirectional long short-term memory-conditional random field (BERT-BiLSTM-CRF) model for the named entity recognition (NER) task and the MC-BERT + feedforward neural network (FFNN) model for the entity extraction task. The pretrained language model (MC-BERT) is trained with many unlabeled medical data and fine-tuned using other medical domain datasets. We apply the established model to automatically label the remaining CEMRs, and then an RA knowledge graph is constructed based on the entities and entity relations, a preliminary assessment is conducted, and an intelligent application is presented. RESULTS: The proposed model achieved better performance than that of other widely used models in knowledge extraction tasks, with mean F1 scores of 92.96% in entity recognition and 95.29% in relation extraction. This study preliminarily confirmed that using a pretrained medical language model could solve the problem that knowledge extraction from CEMRs requires a large number of manual annotations. An RA knowledge graph based on the above identified entities and extracted relations from 1,986 CEMRs was constructed. Experts verified the effectiveness of the constructed RA knowledge graph. CONCLUSIONS: In this paper, an RA knowledge graph based on CEMRs was established, the processes of data annotation, automatic knowledge extraction, and knowledge graph construction were described, and a preliminary assessment and an application were presented. The study demonstrated the viability of a pretrained language model combined with a deep neural network for knowledge extraction tasks from CEMRs based on a small number of manually annotated samples. AME Publishing Company 2023-05-08 2023-06-01 /pmc/articles/PMC10240026/ /pubmed/37284084 http://dx.doi.org/10.21037/qims-22-1158 Text en 2023 Quantitative Imaging in Medicine and Surgery. All rights reserved. https://creativecommons.org/licenses/by-nc-nd/4.0/Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/) . |
spellingShingle | Original Article Liu, Feifei Liu, Mingtong Li, Meiting Xin, Yuwei Gao, Dongping Wu, Jun Zhu, Jiaan Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction |
title | Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction |
title_full | Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction |
title_fullStr | Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction |
title_full_unstemmed | Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction |
title_short | Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction |
title_sort | automatic knowledge extraction from chinese electronic medical records and rheumatoid arthritis knowledge graph construction |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10240026/ https://www.ncbi.nlm.nih.gov/pubmed/37284084 http://dx.doi.org/10.21037/qims-22-1158 |
work_keys_str_mv | AT liufeifei automaticknowledgeextractionfromchineseelectronicmedicalrecordsandrheumatoidarthritisknowledgegraphconstruction AT liumingtong automaticknowledgeextractionfromchineseelectronicmedicalrecordsandrheumatoidarthritisknowledgegraphconstruction AT limeiting automaticknowledgeextractionfromchineseelectronicmedicalrecordsandrheumatoidarthritisknowledgegraphconstruction AT xinyuwei automaticknowledgeextractionfromchineseelectronicmedicalrecordsandrheumatoidarthritisknowledgegraphconstruction AT gaodongping automaticknowledgeextractionfromchineseelectronicmedicalrecordsandrheumatoidarthritisknowledgegraphconstruction AT wujun automaticknowledgeextractionfromchineseelectronicmedicalrecordsandrheumatoidarthritisknowledgegraphconstruction AT zhujiaan automaticknowledgeextractionfromchineseelectronicmedicalrecordsandrheumatoidarthritisknowledgegraphconstruction |