Cargando…
Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System
BACKGROUND: Genealogical information, such as that found in family trees, is imperative for biomedical research such as disease heritability and risk prediction. Researchers have used policyholder and their dependent information in medical claims data and emergency contacts in electronic health reco...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8374669/ https://www.ncbi.nlm.nih.gov/pubmed/34346903 http://dx.doi.org/10.2196/25670 |
_version_ | 1783740166131679232 |
---|---|
author | He, Kai Yao, Lixia Zhang, JiaWei Li, Yufei Li, Chen |
author_facet | He, Kai Yao, Lixia Zhang, JiaWei Li, Yufei Li, Chen |
author_sort | He, Kai |
collection | PubMed |
description | BACKGROUND: Genealogical information, such as that found in family trees, is imperative for biomedical research such as disease heritability and risk prediction. Researchers have used policyholder and their dependent information in medical claims data and emergency contacts in electronic health records (EHRs) to infer family relationships at a large scale. We have previously demonstrated that online obituaries can be a novel data source for building more complete and accurate family trees. OBJECTIVE: Aiming at supplementing EHR data with family relationships for biomedical research, we built an end-to-end information extraction system using a multitask-based artificial neural network model to construct genealogical knowledge graphs (GKGs) from online obituaries. GKGs are enriched family trees with detailed information including age, gender, death and birth dates, and residence. METHODS: Built on a predefined family relationship map consisting of 4 types of entities (eg, people’s name, residence, birth date, and death date) and 71 types of relationships, we curated a corpus containing 1700 online obituaries from the metropolitan area of Minneapolis and St Paul in Minnesota. We also adopted data augmentation technology to generate additional synthetic data to alleviate the issue of data scarcity for rare family relationships. A multitask-based artificial neural network model was then built to simultaneously detect names, extract relationships between them, and assign attributes (eg, birth dates and death dates, residence, age, and gender) to each individual. In the end, we assemble related GKGs into larger ones by identifying people appearing in multiple obituaries. RESULTS: Our system achieved satisfying precision (94.79%), recall (91.45%), and F-1 measures (93.09%) on 10-fold cross-validation. We also constructed 12,407 GKGs, with the largest one made up of 4 generations and 30 people. CONCLUSIONS: In this work, we discussed the meaning of GKGs for biomedical research, presented a new version of a corpus with a predefined family relationship map and augmented training data, and proposed a multitask deep neural system to construct and assemble GKGs. The results show our system can extract and demonstrate the potential of enriching EHR data for more genetic research. We share the source codes and system with the entire scientific community on GitHub without the corpus for privacy protection. |
format | Online Article Text |
id | pubmed-8374669 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-83746692021-08-24 Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System He, Kai Yao, Lixia Zhang, JiaWei Li, Yufei Li, Chen J Med Internet Res Original Paper BACKGROUND: Genealogical information, such as that found in family trees, is imperative for biomedical research such as disease heritability and risk prediction. Researchers have used policyholder and their dependent information in medical claims data and emergency contacts in electronic health records (EHRs) to infer family relationships at a large scale. We have previously demonstrated that online obituaries can be a novel data source for building more complete and accurate family trees. OBJECTIVE: Aiming at supplementing EHR data with family relationships for biomedical research, we built an end-to-end information extraction system using a multitask-based artificial neural network model to construct genealogical knowledge graphs (GKGs) from online obituaries. GKGs are enriched family trees with detailed information including age, gender, death and birth dates, and residence. METHODS: Built on a predefined family relationship map consisting of 4 types of entities (eg, people’s name, residence, birth date, and death date) and 71 types of relationships, we curated a corpus containing 1700 online obituaries from the metropolitan area of Minneapolis and St Paul in Minnesota. We also adopted data augmentation technology to generate additional synthetic data to alleviate the issue of data scarcity for rare family relationships. A multitask-based artificial neural network model was then built to simultaneously detect names, extract relationships between them, and assign attributes (eg, birth dates and death dates, residence, age, and gender) to each individual. In the end, we assemble related GKGs into larger ones by identifying people appearing in multiple obituaries. RESULTS: Our system achieved satisfying precision (94.79%), recall (91.45%), and F-1 measures (93.09%) on 10-fold cross-validation. We also constructed 12,407 GKGs, with the largest one made up of 4 generations and 30 people. CONCLUSIONS: In this work, we discussed the meaning of GKGs for biomedical research, presented a new version of a corpus with a predefined family relationship map and augmented training data, and proposed a multitask deep neural system to construct and assemble GKGs. The results show our system can extract and demonstrate the potential of enriching EHR data for more genetic research. We share the source codes and system with the entire scientific community on GitHub without the corpus for privacy protection. JMIR Publications 2021-08-04 /pmc/articles/PMC8374669/ /pubmed/34346903 http://dx.doi.org/10.2196/25670 Text en ©Kai He, Lixia Yao, JiaWei Zhang, Yufei Li, Chen Li. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 04.08.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper He, Kai Yao, Lixia Zhang, JiaWei Li, Yufei Li, Chen Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System |
title | Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System |
title_full | Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System |
title_fullStr | Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System |
title_full_unstemmed | Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System |
title_short | Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System |
title_sort | construction of genealogical knowledge graphs from obituaries: multitask neural network extraction system |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8374669/ https://www.ncbi.nlm.nih.gov/pubmed/34346903 http://dx.doi.org/10.2196/25670 |
work_keys_str_mv | AT hekai constructionofgenealogicalknowledgegraphsfromobituariesmultitaskneuralnetworkextractionsystem AT yaolixia constructionofgenealogicalknowledgegraphsfromobituariesmultitaskneuralnetworkextractionsystem AT zhangjiawei constructionofgenealogicalknowledgegraphsfromobituariesmultitaskneuralnetworkextractionsystem AT liyufei constructionofgenealogicalknowledgegraphsfromobituariesmultitaskneuralnetworkextractionsystem AT lichen constructionofgenealogicalknowledgegraphsfromobituariesmultitaskneuralnetworkextractionsystem |