Cargando…

Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System

BACKGROUND: Genealogical information, such as that found in family trees, is imperative for biomedical research such as disease heritability and risk prediction. Researchers have used policyholder and their dependent information in medical claims data and emergency contacts in electronic health reco...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Kai, Yao, Lixia, Zhang, JiaWei, Li, Yufei, Li, Chen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8374669/
https://www.ncbi.nlm.nih.gov/pubmed/34346903
http://dx.doi.org/10.2196/25670
_version_ 1783740166131679232
author He, Kai
Yao, Lixia
Zhang, JiaWei
Li, Yufei
Li, Chen
author_facet He, Kai
Yao, Lixia
Zhang, JiaWei
Li, Yufei
Li, Chen
author_sort He, Kai
collection PubMed
description BACKGROUND: Genealogical information, such as that found in family trees, is imperative for biomedical research such as disease heritability and risk prediction. Researchers have used policyholder and their dependent information in medical claims data and emergency contacts in electronic health records (EHRs) to infer family relationships at a large scale. We have previously demonstrated that online obituaries can be a novel data source for building more complete and accurate family trees. OBJECTIVE: Aiming at supplementing EHR data with family relationships for biomedical research, we built an end-to-end information extraction system using a multitask-based artificial neural network model to construct genealogical knowledge graphs (GKGs) from online obituaries. GKGs are enriched family trees with detailed information including age, gender, death and birth dates, and residence. METHODS: Built on a predefined family relationship map consisting of 4 types of entities (eg, people’s name, residence, birth date, and death date) and 71 types of relationships, we curated a corpus containing 1700 online obituaries from the metropolitan area of Minneapolis and St Paul in Minnesota. We also adopted data augmentation technology to generate additional synthetic data to alleviate the issue of data scarcity for rare family relationships. A multitask-based artificial neural network model was then built to simultaneously detect names, extract relationships between them, and assign attributes (eg, birth dates and death dates, residence, age, and gender) to each individual. In the end, we assemble related GKGs into larger ones by identifying people appearing in multiple obituaries. RESULTS: Our system achieved satisfying precision (94.79%), recall (91.45%), and F-1 measures (93.09%) on 10-fold cross-validation. We also constructed 12,407 GKGs, with the largest one made up of 4 generations and 30 people. CONCLUSIONS: In this work, we discussed the meaning of GKGs for biomedical research, presented a new version of a corpus with a predefined family relationship map and augmented training data, and proposed a multitask deep neural system to construct and assemble GKGs. The results show our system can extract and demonstrate the potential of enriching EHR data for more genetic research. We share the source codes and system with the entire scientific community on GitHub without the corpus for privacy protection.
format Online
Article
Text
id pubmed-8374669
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-83746692021-08-24 Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System He, Kai Yao, Lixia Zhang, JiaWei Li, Yufei Li, Chen J Med Internet Res Original Paper BACKGROUND: Genealogical information, such as that found in family trees, is imperative for biomedical research such as disease heritability and risk prediction. Researchers have used policyholder and their dependent information in medical claims data and emergency contacts in electronic health records (EHRs) to infer family relationships at a large scale. We have previously demonstrated that online obituaries can be a novel data source for building more complete and accurate family trees. OBJECTIVE: Aiming at supplementing EHR data with family relationships for biomedical research, we built an end-to-end information extraction system using a multitask-based artificial neural network model to construct genealogical knowledge graphs (GKGs) from online obituaries. GKGs are enriched family trees with detailed information including age, gender, death and birth dates, and residence. METHODS: Built on a predefined family relationship map consisting of 4 types of entities (eg, people’s name, residence, birth date, and death date) and 71 types of relationships, we curated a corpus containing 1700 online obituaries from the metropolitan area of Minneapolis and St Paul in Minnesota. We also adopted data augmentation technology to generate additional synthetic data to alleviate the issue of data scarcity for rare family relationships. A multitask-based artificial neural network model was then built to simultaneously detect names, extract relationships between them, and assign attributes (eg, birth dates and death dates, residence, age, and gender) to each individual. In the end, we assemble related GKGs into larger ones by identifying people appearing in multiple obituaries. RESULTS: Our system achieved satisfying precision (94.79%), recall (91.45%), and F-1 measures (93.09%) on 10-fold cross-validation. We also constructed 12,407 GKGs, with the largest one made up of 4 generations and 30 people. CONCLUSIONS: In this work, we discussed the meaning of GKGs for biomedical research, presented a new version of a corpus with a predefined family relationship map and augmented training data, and proposed a multitask deep neural system to construct and assemble GKGs. The results show our system can extract and demonstrate the potential of enriching EHR data for more genetic research. We share the source codes and system with the entire scientific community on GitHub without the corpus for privacy protection. JMIR Publications 2021-08-04 /pmc/articles/PMC8374669/ /pubmed/34346903 http://dx.doi.org/10.2196/25670 Text en ©Kai He, Lixia Yao, JiaWei Zhang, Yufei Li, Chen Li. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 04.08.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
He, Kai
Yao, Lixia
Zhang, JiaWei
Li, Yufei
Li, Chen
Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System
title Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System
title_full Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System
title_fullStr Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System
title_full_unstemmed Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System
title_short Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System
title_sort construction of genealogical knowledge graphs from obituaries: multitask neural network extraction system
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8374669/
https://www.ncbi.nlm.nih.gov/pubmed/34346903
http://dx.doi.org/10.2196/25670
work_keys_str_mv AT hekai constructionofgenealogicalknowledgegraphsfromobituariesmultitaskneuralnetworkextractionsystem
AT yaolixia constructionofgenealogicalknowledgegraphsfromobituariesmultitaskneuralnetworkextractionsystem
AT zhangjiawei constructionofgenealogicalknowledgegraphsfromobituariesmultitaskneuralnetworkextractionsystem
AT liyufei constructionofgenealogicalknowledgegraphsfromobituariesmultitaskneuralnetworkextractionsystem
AT lichen constructionofgenealogicalknowledgegraphsfromobituariesmultitaskneuralnetworkextractionsystem