Cargando…

Family member information extraction via neural sequence labeling models with different tag schemes

BACKGROUND: Family history information (FHI) described in unstructured electronic health records (EHRs) is a valuable information source for patient care and scientific researches. Since FHI is usually described in the format of free text, the entire process of FHI extraction consists of various ste...

Descripción completa

Detalles Bibliográficos
Autor principal: Dai, Hong-Jie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933890/
https://www.ncbi.nlm.nih.gov/pubmed/31881965
http://dx.doi.org/10.1186/s12911-019-0996-4
_version_ 1783483297239662592
author Dai, Hong-Jie
author_facet Dai, Hong-Jie
author_sort Dai, Hong-Jie
collection PubMed
description BACKGROUND: Family history information (FHI) described in unstructured electronic health records (EHRs) is a valuable information source for patient care and scientific researches. Since FHI is usually described in the format of free text, the entire process of FHI extraction consists of various steps including section segmentation, family member and clinical observation extraction, and relation discovery between the extracted members and their observations. The extraction step involves the recognition of FHI concepts along with their properties such as the family side attribute of the family member concept. METHODS: This study focuses on the extraction step and formulates it as a sequence labeling problem. We employed a neural sequence labeling model along with different tag schemes to distinguish family members and their observations. Corresponding to different tag schemes, the identified entities were aggregated and processed by different algorithms to determine the required properties. RESULTS: We studied the effectiveness of encoding required properties in the tag schemes by evaluating their performance on the dataset released by the BioCreative/OHNLP challenge 2018. It was observed that the proposed side scheme along with the developed features and neural network architecture can achieve an overall F1-score of 0.849 on the test set, which ranked second in the FHI entity recognition subtask. CONCLUSIONS: By comparing with the performance of conditional random fields models, the developed neural network-based models performed significantly better. However, our error analysis revealed two challenging issues of the current approach. One is that some properties required cross-sentence inferences. The other is that the current model is not able to distinguish between the narratives describing the family members of the patient and those specifying the relatives of the patient’s family members.
format Online
Article
Text
id pubmed-6933890
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69338902019-12-30 Family member information extraction via neural sequence labeling models with different tag schemes Dai, Hong-Jie BMC Med Inform Decis Mak Research BACKGROUND: Family history information (FHI) described in unstructured electronic health records (EHRs) is a valuable information source for patient care and scientific researches. Since FHI is usually described in the format of free text, the entire process of FHI extraction consists of various steps including section segmentation, family member and clinical observation extraction, and relation discovery between the extracted members and their observations. The extraction step involves the recognition of FHI concepts along with their properties such as the family side attribute of the family member concept. METHODS: This study focuses on the extraction step and formulates it as a sequence labeling problem. We employed a neural sequence labeling model along with different tag schemes to distinguish family members and their observations. Corresponding to different tag schemes, the identified entities were aggregated and processed by different algorithms to determine the required properties. RESULTS: We studied the effectiveness of encoding required properties in the tag schemes by evaluating their performance on the dataset released by the BioCreative/OHNLP challenge 2018. It was observed that the proposed side scheme along with the developed features and neural network architecture can achieve an overall F1-score of 0.849 on the test set, which ranked second in the FHI entity recognition subtask. CONCLUSIONS: By comparing with the performance of conditional random fields models, the developed neural network-based models performed significantly better. However, our error analysis revealed two challenging issues of the current approach. One is that some properties required cross-sentence inferences. The other is that the current model is not able to distinguish between the narratives describing the family members of the patient and those specifying the relatives of the patient’s family members. BioMed Central 2019-12-27 /pmc/articles/PMC6933890/ /pubmed/31881965 http://dx.doi.org/10.1186/s12911-019-0996-4 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Dai, Hong-Jie
Family member information extraction via neural sequence labeling models with different tag schemes
title Family member information extraction via neural sequence labeling models with different tag schemes
title_full Family member information extraction via neural sequence labeling models with different tag schemes
title_fullStr Family member information extraction via neural sequence labeling models with different tag schemes
title_full_unstemmed Family member information extraction via neural sequence labeling models with different tag schemes
title_short Family member information extraction via neural sequence labeling models with different tag schemes
title_sort family member information extraction via neural sequence labeling models with different tag schemes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933890/
https://www.ncbi.nlm.nih.gov/pubmed/31881965
http://dx.doi.org/10.1186/s12911-019-0996-4
work_keys_str_mv AT daihongjie familymemberinformationextractionvianeuralsequencelabelingmodelswithdifferenttagschemes