Cargando…

Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers–Assisted Sublanguage Analysis

BACKGROUND: A patient’s family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized method to capture FH information in electronic health records and a substantial portion of FH information is frequently embedded in clinical no...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Liwei, He, Huan, Wen, Andrew, Moon, Sungrim, Fu, Sunyang, Peterson, Kevin J, Ai, Xuguang, Liu, Sijia, Kavuluru, Ramakanth, Liu, Hongfang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337517/
https://www.ncbi.nlm.nih.gov/pubmed/37368483
http://dx.doi.org/10.2196/48072
_version_ 1785071442374688768
author Wang, Liwei
He, Huan
Wen, Andrew
Moon, Sungrim
Fu, Sunyang
Peterson, Kevin J
Ai, Xuguang
Liu, Sijia
Kavuluru, Ramakanth
Liu, Hongfang
author_facet Wang, Liwei
He, Huan
Wen, Andrew
Moon, Sungrim
Fu, Sunyang
Peterson, Kevin J
Ai, Xuguang
Liu, Sijia
Kavuluru, Ramakanth
Liu, Hongfang
author_sort Wang, Liwei
collection PubMed
description BACKGROUND: A patient’s family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized method to capture FH information in electronic health records and a substantial portion of FH information is frequently embedded in clinical notes. This renders FH information difficult to use in downstream data analytics or clinical decision support applications. To address this issue, a natural language processing system capable of extracting and normalizing FH information can be used. OBJECTIVE: In this study, we aimed to construct an FH lexical resource for information extraction and normalization. METHODS: We exploited a transformer-based method to construct an FH lexical resource leveraging a corpus consisting of clinical notes generated as part of primary care. The usability of the lexicon was demonstrated through the development of a rule-based FH system that extracts FH entities and relations as specified in previous FH challenges. We also experimented with a deep learning–based FH system for FH information extraction. Previous FH challenge data sets were used for evaluation. RESULTS: The resulting lexicon contains 33,603 lexicon entries normalized to 6408 concept unique identifiers of the Unified Medical Language System and 15,126 codes of the Systematized Nomenclature of Medicine Clinical Terms, with an average number of 5.4 variants per concept. The performance evaluation demonstrated that the rule-based FH system achieved reasonable performance. The combination of the rule-based FH system with a state-of-the-art deep learning–based FH system can improve the recall of FH information evaluated using the BioCreative/N2C2 FH challenge data set, with the F1 score varied but comparable. CONCLUSIONS: The resulting lexicon and rule-based FH system are freely available through the Open Health Natural Language Processing GitHub.
format Online
Article
Text
id pubmed-10337517
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-103375172023-07-13 Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers–Assisted Sublanguage Analysis Wang, Liwei He, Huan Wen, Andrew Moon, Sungrim Fu, Sunyang Peterson, Kevin J Ai, Xuguang Liu, Sijia Kavuluru, Ramakanth Liu, Hongfang JMIR Med Inform Original Paper BACKGROUND: A patient’s family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized method to capture FH information in electronic health records and a substantial portion of FH information is frequently embedded in clinical notes. This renders FH information difficult to use in downstream data analytics or clinical decision support applications. To address this issue, a natural language processing system capable of extracting and normalizing FH information can be used. OBJECTIVE: In this study, we aimed to construct an FH lexical resource for information extraction and normalization. METHODS: We exploited a transformer-based method to construct an FH lexical resource leveraging a corpus consisting of clinical notes generated as part of primary care. The usability of the lexicon was demonstrated through the development of a rule-based FH system that extracts FH entities and relations as specified in previous FH challenges. We also experimented with a deep learning–based FH system for FH information extraction. Previous FH challenge data sets were used for evaluation. RESULTS: The resulting lexicon contains 33,603 lexicon entries normalized to 6408 concept unique identifiers of the Unified Medical Language System and 15,126 codes of the Systematized Nomenclature of Medicine Clinical Terms, with an average number of 5.4 variants per concept. The performance evaluation demonstrated that the rule-based FH system achieved reasonable performance. The combination of the rule-based FH system with a state-of-the-art deep learning–based FH system can improve the recall of FH information evaluated using the BioCreative/N2C2 FH challenge data set, with the F1 score varied but comparable. CONCLUSIONS: The resulting lexicon and rule-based FH system are freely available through the Open Health Natural Language Processing GitHub. JMIR Publications 2023-06-27 /pmc/articles/PMC10337517/ /pubmed/37368483 http://dx.doi.org/10.2196/48072 Text en ©Liwei Wang, Huan He, Andrew Wen, Sungrim Moon, Sunyang Fu, Kevin J Peterson, Xuguang Ai, Sijia Liu, Ramakanth Kavuluru, Hongfang Liu. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 27.06.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Wang, Liwei
He, Huan
Wen, Andrew
Moon, Sungrim
Fu, Sunyang
Peterson, Kevin J
Ai, Xuguang
Liu, Sijia
Kavuluru, Ramakanth
Liu, Hongfang
Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers–Assisted Sublanguage Analysis
title Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers–Assisted Sublanguage Analysis
title_full Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers–Assisted Sublanguage Analysis
title_fullStr Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers–Assisted Sublanguage Analysis
title_full_unstemmed Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers–Assisted Sublanguage Analysis
title_short Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers–Assisted Sublanguage Analysis
title_sort acquisition of a lexicon for family history information: bidirectional encoder representations from transformers–assisted sublanguage analysis
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337517/
https://www.ncbi.nlm.nih.gov/pubmed/37368483
http://dx.doi.org/10.2196/48072
work_keys_str_mv AT wangliwei acquisitionofalexiconforfamilyhistoryinformationbidirectionalencoderrepresentationsfromtransformersassistedsublanguageanalysis
AT hehuan acquisitionofalexiconforfamilyhistoryinformationbidirectionalencoderrepresentationsfromtransformersassistedsublanguageanalysis
AT wenandrew acquisitionofalexiconforfamilyhistoryinformationbidirectionalencoderrepresentationsfromtransformersassistedsublanguageanalysis
AT moonsungrim acquisitionofalexiconforfamilyhistoryinformationbidirectionalencoderrepresentationsfromtransformersassistedsublanguageanalysis
AT fusunyang acquisitionofalexiconforfamilyhistoryinformationbidirectionalencoderrepresentationsfromtransformersassistedsublanguageanalysis
AT petersonkevinj acquisitionofalexiconforfamilyhistoryinformationbidirectionalencoderrepresentationsfromtransformersassistedsublanguageanalysis
AT aixuguang acquisitionofalexiconforfamilyhistoryinformationbidirectionalencoderrepresentationsfromtransformersassistedsublanguageanalysis
AT liusijia acquisitionofalexiconforfamilyhistoryinformationbidirectionalencoderrepresentationsfromtransformersassistedsublanguageanalysis
AT kavulururamakanth acquisitionofalexiconforfamilyhistoryinformationbidirectionalencoderrepresentationsfromtransformersassistedsublanguageanalysis
AT liuhongfang acquisitionofalexiconforfamilyhistoryinformationbidirectionalencoderrepresentationsfromtransformersassistedsublanguageanalysis