Cargando…

Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods

BACKGROUND: Diabetic retinopathy (DR) is a leading cause of blindness in American adults. If detected, DR can be treated to prevent further damage causing blindness. There is an increasing interest in developing artificial intelligence (AI) technologies to help detect DR using electronic health reco...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Zehao, Yang, Xi, Sweeting, Gianna L., Ma, Yinghan, Stolte, Skylar E., Fang, Ruogu, Wu, Yonghui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9513862/
https://www.ncbi.nlm.nih.gov/pubmed/36167551
http://dx.doi.org/10.1186/s12911-022-01996-2
_version_ 1784798158306410496
author Yu, Zehao
Yang, Xi
Sweeting, Gianna L.
Ma, Yinghan
Stolte, Skylar E.
Fang, Ruogu
Wu, Yonghui
author_facet Yu, Zehao
Yang, Xi
Sweeting, Gianna L.
Ma, Yinghan
Stolte, Skylar E.
Fang, Ruogu
Wu, Yonghui
author_sort Yu, Zehao
collection PubMed
description BACKGROUND: Diabetic retinopathy (DR) is a leading cause of blindness in American adults. If detected, DR can be treated to prevent further damage causing blindness. There is an increasing interest in developing artificial intelligence (AI) technologies to help detect DR using electronic health records. The lesion-related information documented in fundus image reports is a valuable resource that could help diagnoses of DR in clinical decision support systems. However, most studies for AI-based DR diagnoses are mainly based on medical images; there is limited studies to explore the lesion-related information captured in the free text image reports. METHODS: In this study, we examined two state-of-the-art transformer-based natural language processing (NLP) models, including BERT and RoBERTa, compared them with a recurrent neural network implemented using Long short-term memory (LSTM) to extract DR-related concepts from clinical narratives. We identified four different categories of DR-related clinical concepts including lesions, eye parts, laterality, and severity, developed annotation guidelines, annotated a DR-corpus of 536 image reports, and developed transformer-based NLP models for clinical concept extraction and relation extraction. We also examined the relation extraction under two settings including ‘gold-standard’ setting—where gold-standard concepts were used–and end-to-end setting. RESULTS: For concept extraction, the BERT model pretrained with the MIMIC III dataset achieve the best performance (0.9503 and 0.9645 for strict/lenient evaluation). For relation extraction, BERT model pretrained using general English text achieved the best strict/lenient F1-score of 0.9316. The end-to-end system, BERT_general_e2e, achieved the best strict/lenient F1-score of 0.8578 and 0.8881, respectively. Another end-to-end system based on the RoBERTa architecture, RoBERTa_general_e2e, also achieved the same performance as BERT_general_e2e in strict scores. CONCLUSIONS: This study demonstrated the efficiency of transformer-based NLP models for clinical concept extraction and relation extraction. Our results show that it’s necessary to pretrain transformer models using clinical text to optimize the performance for clinical concept extraction. Whereas, for relation extraction, transformers pretrained using general English text perform better.
format Online
Article
Text
id pubmed-9513862
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-95138622022-09-28 Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods Yu, Zehao Yang, Xi Sweeting, Gianna L. Ma, Yinghan Stolte, Skylar E. Fang, Ruogu Wu, Yonghui BMC Med Inform Decis Mak Research BACKGROUND: Diabetic retinopathy (DR) is a leading cause of blindness in American adults. If detected, DR can be treated to prevent further damage causing blindness. There is an increasing interest in developing artificial intelligence (AI) technologies to help detect DR using electronic health records. The lesion-related information documented in fundus image reports is a valuable resource that could help diagnoses of DR in clinical decision support systems. However, most studies for AI-based DR diagnoses are mainly based on medical images; there is limited studies to explore the lesion-related information captured in the free text image reports. METHODS: In this study, we examined two state-of-the-art transformer-based natural language processing (NLP) models, including BERT and RoBERTa, compared them with a recurrent neural network implemented using Long short-term memory (LSTM) to extract DR-related concepts from clinical narratives. We identified four different categories of DR-related clinical concepts including lesions, eye parts, laterality, and severity, developed annotation guidelines, annotated a DR-corpus of 536 image reports, and developed transformer-based NLP models for clinical concept extraction and relation extraction. We also examined the relation extraction under two settings including ‘gold-standard’ setting—where gold-standard concepts were used–and end-to-end setting. RESULTS: For concept extraction, the BERT model pretrained with the MIMIC III dataset achieve the best performance (0.9503 and 0.9645 for strict/lenient evaluation). For relation extraction, BERT model pretrained using general English text achieved the best strict/lenient F1-score of 0.9316. The end-to-end system, BERT_general_e2e, achieved the best strict/lenient F1-score of 0.8578 and 0.8881, respectively. Another end-to-end system based on the RoBERTa architecture, RoBERTa_general_e2e, also achieved the same performance as BERT_general_e2e in strict scores. CONCLUSIONS: This study demonstrated the efficiency of transformer-based NLP models for clinical concept extraction and relation extraction. Our results show that it’s necessary to pretrain transformer models using clinical text to optimize the performance for clinical concept extraction. Whereas, for relation extraction, transformers pretrained using general English text perform better. BioMed Central 2022-09-27 /pmc/articles/PMC9513862/ /pubmed/36167551 http://dx.doi.org/10.1186/s12911-022-01996-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Yu, Zehao
Yang, Xi
Sweeting, Gianna L.
Ma, Yinghan
Stolte, Skylar E.
Fang, Ruogu
Wu, Yonghui
Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods
title Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods
title_full Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods
title_fullStr Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods
title_full_unstemmed Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods
title_short Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods
title_sort identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9513862/
https://www.ncbi.nlm.nih.gov/pubmed/36167551
http://dx.doi.org/10.1186/s12911-022-01996-2
work_keys_str_mv AT yuzehao identifydiabeticretinopathyrelatedclinicalconceptsandtheirattributesusingtransformerbasednaturallanguageprocessingmethods
AT yangxi identifydiabeticretinopathyrelatedclinicalconceptsandtheirattributesusingtransformerbasednaturallanguageprocessingmethods
AT sweetinggiannal identifydiabeticretinopathyrelatedclinicalconceptsandtheirattributesusingtransformerbasednaturallanguageprocessingmethods
AT mayinghan identifydiabeticretinopathyrelatedclinicalconceptsandtheirattributesusingtransformerbasednaturallanguageprocessingmethods
AT stolteskylare identifydiabeticretinopathyrelatedclinicalconceptsandtheirattributesusingtransformerbasednaturallanguageprocessingmethods
AT fangruogu identifydiabeticretinopathyrelatedclinicalconceptsandtheirattributesusingtransformerbasednaturallanguageprocessingmethods
AT wuyonghui identifydiabeticretinopathyrelatedclinicalconceptsandtheirattributesusingtransformerbasednaturallanguageprocessingmethods