Cargando…

Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study

BACKGROUND: Phenotypes characterize the clinical manifestations of diseases and provide important information for diagnosis. Therefore, the construction of phenotype knowledge graphs for diseases is valuable to the development of artificial intelligence in medicine. However, phenotype knowledge grap...

Descripción completa

Detalles Bibliográficos
Autores principales: Deng, Lizong, Chen, Luming, Yang, Tao, Liu, Mi, Li, Shicheng, Jiang, Taijiao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8277235/
https://www.ncbi.nlm.nih.gov/pubmed/34128811
http://dx.doi.org/10.2196/26892
_version_ 1783722038469328896
author Deng, Lizong
Chen, Luming
Yang, Tao
Liu, Mi
Li, Shicheng
Jiang, Taijiao
author_facet Deng, Lizong
Chen, Luming
Yang, Tao
Liu, Mi
Li, Shicheng
Jiang, Taijiao
author_sort Deng, Lizong
collection PubMed
description BACKGROUND: Phenotypes characterize the clinical manifestations of diseases and provide important information for diagnosis. Therefore, the construction of phenotype knowledge graphs for diseases is valuable to the development of artificial intelligence in medicine. However, phenotype knowledge graphs in current knowledge bases such as WikiData and DBpedia are coarse-grained knowledge graphs because they only consider the core concepts of phenotypes while neglecting the details (attributes) associated with these phenotypes. OBJECTIVE: To characterize the details of disease phenotypes for clinical guidelines, we proposed a fine-grained semantic information model named PhenoSSU (semantic structured unit of phenotypes). METHODS: PhenoSSU is an “entity-attribute-value” model by its very nature, and it aims to capture the full semantic information underlying phenotype descriptions with a series of attributes and values. A total of 193 clinical guidelines for infectious diseases from Wikipedia were selected as the study corpus, and 12 attributes from SNOMED-CT were introduced into the PhenoSSU model based on the co-occurrences of phenotype concepts and attribute values. The expressive power of the PhenoSSU model was evaluated by analyzing whether PhenoSSU instances could capture the full semantics underlying the descriptions of the corresponding phenotypes. To automatically construct fine-grained phenotype knowledge graphs, a hybrid strategy that first recognized phenotype concepts with the MetaMap tool and then predicted the attribute values of phenotypes with machine learning classifiers was developed. RESULTS: Fine-grained phenotype knowledge graphs of 193 infectious diseases were manually constructed with the BRAT annotation tool. A total of 4020 PhenoSSU instances were annotated in these knowledge graphs, and 3757 of them (89.5%) were found to be able to capture the full semantics underlying the descriptions of the corresponding phenotypes listed in clinical guidelines. By comparison, other information models, such as the clinical element model and the HL7 fast health care interoperability resource model, could only capture the full semantics underlying 48.4% (2034/4020) and 21.8% (914/4020) of the descriptions of phenotypes listed in clinical guidelines, respectively. The hybrid strategy achieved an F1-score of 0.732 for the subtask of phenotype concept recognition and an average weighted accuracy of 0.776 for the subtask of attribute value prediction. CONCLUSIONS: PhenoSSU is an effective information model for the precise representation of phenotype knowledge for clinical guidelines, and machine learning can be used to improve the efficiency of constructing PhenoSSU-based knowledge graphs. Our work will potentially shift the focus of medical knowledge engineering from a coarse-grained level to a more fine-grained level.
format Online
Article
Text
id pubmed-8277235
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-82772352021-07-26 Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study Deng, Lizong Chen, Luming Yang, Tao Liu, Mi Li, Shicheng Jiang, Taijiao J Med Internet Res Original Paper BACKGROUND: Phenotypes characterize the clinical manifestations of diseases and provide important information for diagnosis. Therefore, the construction of phenotype knowledge graphs for diseases is valuable to the development of artificial intelligence in medicine. However, phenotype knowledge graphs in current knowledge bases such as WikiData and DBpedia are coarse-grained knowledge graphs because they only consider the core concepts of phenotypes while neglecting the details (attributes) associated with these phenotypes. OBJECTIVE: To characterize the details of disease phenotypes for clinical guidelines, we proposed a fine-grained semantic information model named PhenoSSU (semantic structured unit of phenotypes). METHODS: PhenoSSU is an “entity-attribute-value” model by its very nature, and it aims to capture the full semantic information underlying phenotype descriptions with a series of attributes and values. A total of 193 clinical guidelines for infectious diseases from Wikipedia were selected as the study corpus, and 12 attributes from SNOMED-CT were introduced into the PhenoSSU model based on the co-occurrences of phenotype concepts and attribute values. The expressive power of the PhenoSSU model was evaluated by analyzing whether PhenoSSU instances could capture the full semantics underlying the descriptions of the corresponding phenotypes. To automatically construct fine-grained phenotype knowledge graphs, a hybrid strategy that first recognized phenotype concepts with the MetaMap tool and then predicted the attribute values of phenotypes with machine learning classifiers was developed. RESULTS: Fine-grained phenotype knowledge graphs of 193 infectious diseases were manually constructed with the BRAT annotation tool. A total of 4020 PhenoSSU instances were annotated in these knowledge graphs, and 3757 of them (89.5%) were found to be able to capture the full semantics underlying the descriptions of the corresponding phenotypes listed in clinical guidelines. By comparison, other information models, such as the clinical element model and the HL7 fast health care interoperability resource model, could only capture the full semantics underlying 48.4% (2034/4020) and 21.8% (914/4020) of the descriptions of phenotypes listed in clinical guidelines, respectively. The hybrid strategy achieved an F1-score of 0.732 for the subtask of phenotype concept recognition and an average weighted accuracy of 0.776 for the subtask of attribute value prediction. CONCLUSIONS: PhenoSSU is an effective information model for the precise representation of phenotype knowledge for clinical guidelines, and machine learning can be used to improve the efficiency of constructing PhenoSSU-based knowledge graphs. Our work will potentially shift the focus of medical knowledge engineering from a coarse-grained level to a more fine-grained level. JMIR Publications 2021-06-15 /pmc/articles/PMC8277235/ /pubmed/34128811 http://dx.doi.org/10.2196/26892 Text en ©Lizong Deng, Luming Chen, Tao Yang, Mi Liu, Shicheng Li, Taijiao Jiang. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 15.06.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Deng, Lizong
Chen, Luming
Yang, Tao
Liu, Mi
Li, Shicheng
Jiang, Taijiao
Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study
title Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study
title_full Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study
title_fullStr Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study
title_full_unstemmed Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study
title_short Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study
title_sort constructing high-fidelity phenotype knowledge graphs for infectious diseases with a fine-grained semantic information model: development and usability study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8277235/
https://www.ncbi.nlm.nih.gov/pubmed/34128811
http://dx.doi.org/10.2196/26892
work_keys_str_mv AT denglizong constructinghighfidelityphenotypeknowledgegraphsforinfectiousdiseaseswithafinegrainedsemanticinformationmodeldevelopmentandusabilitystudy
AT chenluming constructinghighfidelityphenotypeknowledgegraphsforinfectiousdiseaseswithafinegrainedsemanticinformationmodeldevelopmentandusabilitystudy
AT yangtao constructinghighfidelityphenotypeknowledgegraphsforinfectiousdiseaseswithafinegrainedsemanticinformationmodeldevelopmentandusabilitystudy
AT liumi constructinghighfidelityphenotypeknowledgegraphsforinfectiousdiseaseswithafinegrainedsemanticinformationmodeldevelopmentandusabilitystudy
AT lishicheng constructinghighfidelityphenotypeknowledgegraphsforinfectiousdiseaseswithafinegrainedsemanticinformationmodeldevelopmentandusabilitystudy
AT jiangtaijiao constructinghighfidelityphenotypeknowledgegraphsforinfectiousdiseaseswithafinegrainedsemanticinformationmodeldevelopmentandusabilitystudy