Cargando…

An automatic approach for constructing a knowledge base of symptoms in Chinese

BACKGROUND: While a large number of well-known knowledge bases (KBs) in life science have been published as Linked Open Data, there are few KBs in Chinese. However, KBs in Chinese are necessary when we want to automatically process and analyze electronic medical records (EMRs) in Chinese. Of all, th...

Descripción completa

Detalles Bibliográficos
Autores principales: Ruan, Tong, Wang, Mengjie, Sun, Jian, Wang, Ting, Zeng, Lu, Yin, Yichao, Gao, Ju
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5763289/
https://www.ncbi.nlm.nih.gov/pubmed/29297414
http://dx.doi.org/10.1186/s13326-017-0145-x
_version_ 1783291854452686848
author Ruan, Tong
Wang, Mengjie
Sun, Jian
Wang, Ting
Zeng, Lu
Yin, Yichao
Gao, Ju
author_facet Ruan, Tong
Wang, Mengjie
Sun, Jian
Wang, Ting
Zeng, Lu
Yin, Yichao
Gao, Ju
author_sort Ruan, Tong
collection PubMed
description BACKGROUND: While a large number of well-known knowledge bases (KBs) in life science have been published as Linked Open Data, there are few KBs in Chinese. However, KBs in Chinese are necessary when we want to automatically process and analyze electronic medical records (EMRs) in Chinese. Of all, the symptom KB in Chinese is the most seriously in need, since symptoms are the starting point of clinical diagnosis. RESULTS: We publish a public KB of symptoms in Chinese, including symptoms, departments, diseases, medicines, and examinations as well as relations between symptoms and the above related entities. To the best of our knowledge, there is no such KB focusing on symptoms in Chinese, and the KB is an important supplement to existing medical resources. Our KB is constructed by fusing data automatically extracted from eight mainstream healthcare websites, three Chinese encyclopedia sites, and symptoms extracted from a larger number of EMRs as supplements. METHODS: Firstly, we design data schema manually by reference to the Unified Medical Language System (UMLS). Secondly, we extract entities from eight mainstream healthcare websites, which are fed as seeds to train a multi-class classifier and classify entities from encyclopedia sites and train a Conditional Random Field (CRF) model to extract symptoms from EMRs. Thirdly, we fuse data to solve the large-scale duplication between different data sources according to entity type alignment, entity mapping, and attribute mapping. Finally, we link our KB to UMLS to investigate similarities and differences between symptoms in Chinese and English. CONCLUSIONS: As a result, the KB has more than 26,000 distinct symptoms in Chinese including 3968 symptoms in traditional Chinese medicine and 1029 synonym pairs for symptoms. The KB also includes concepts such as diseases and medicines as well as relations between symptoms and the above related entities. We also link our KB to the Unified Medical Language System and analyze the differences between symptoms in the two KBs. We released the KB as Linked Open Data and a demo at https://datahub.io/dataset/symptoms-in-chinese.
format Online
Article
Text
id pubmed-5763289
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57632892018-01-17 An automatic approach for constructing a knowledge base of symptoms in Chinese Ruan, Tong Wang, Mengjie Sun, Jian Wang, Ting Zeng, Lu Yin, Yichao Gao, Ju J Biomed Semantics Research BACKGROUND: While a large number of well-known knowledge bases (KBs) in life science have been published as Linked Open Data, there are few KBs in Chinese. However, KBs in Chinese are necessary when we want to automatically process and analyze electronic medical records (EMRs) in Chinese. Of all, the symptom KB in Chinese is the most seriously in need, since symptoms are the starting point of clinical diagnosis. RESULTS: We publish a public KB of symptoms in Chinese, including symptoms, departments, diseases, medicines, and examinations as well as relations between symptoms and the above related entities. To the best of our knowledge, there is no such KB focusing on symptoms in Chinese, and the KB is an important supplement to existing medical resources. Our KB is constructed by fusing data automatically extracted from eight mainstream healthcare websites, three Chinese encyclopedia sites, and symptoms extracted from a larger number of EMRs as supplements. METHODS: Firstly, we design data schema manually by reference to the Unified Medical Language System (UMLS). Secondly, we extract entities from eight mainstream healthcare websites, which are fed as seeds to train a multi-class classifier and classify entities from encyclopedia sites and train a Conditional Random Field (CRF) model to extract symptoms from EMRs. Thirdly, we fuse data to solve the large-scale duplication between different data sources according to entity type alignment, entity mapping, and attribute mapping. Finally, we link our KB to UMLS to investigate similarities and differences between symptoms in Chinese and English. CONCLUSIONS: As a result, the KB has more than 26,000 distinct symptoms in Chinese including 3968 symptoms in traditional Chinese medicine and 1029 synonym pairs for symptoms. The KB also includes concepts such as diseases and medicines as well as relations between symptoms and the above related entities. We also link our KB to the Unified Medical Language System and analyze the differences between symptoms in the two KBs. We released the KB as Linked Open Data and a demo at https://datahub.io/dataset/symptoms-in-chinese. BioMed Central 2017-09-20 /pmc/articles/PMC5763289/ /pubmed/29297414 http://dx.doi.org/10.1186/s13326-017-0145-x Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Ruan, Tong
Wang, Mengjie
Sun, Jian
Wang, Ting
Zeng, Lu
Yin, Yichao
Gao, Ju
An automatic approach for constructing a knowledge base of symptoms in Chinese
title An automatic approach for constructing a knowledge base of symptoms in Chinese
title_full An automatic approach for constructing a knowledge base of symptoms in Chinese
title_fullStr An automatic approach for constructing a knowledge base of symptoms in Chinese
title_full_unstemmed An automatic approach for constructing a knowledge base of symptoms in Chinese
title_short An automatic approach for constructing a knowledge base of symptoms in Chinese
title_sort automatic approach for constructing a knowledge base of symptoms in chinese
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5763289/
https://www.ncbi.nlm.nih.gov/pubmed/29297414
http://dx.doi.org/10.1186/s13326-017-0145-x
work_keys_str_mv AT ruantong anautomaticapproachforconstructingaknowledgebaseofsymptomsinchinese
AT wangmengjie anautomaticapproachforconstructingaknowledgebaseofsymptomsinchinese
AT sunjian anautomaticapproachforconstructingaknowledgebaseofsymptomsinchinese
AT wangting anautomaticapproachforconstructingaknowledgebaseofsymptomsinchinese
AT zenglu anautomaticapproachforconstructingaknowledgebaseofsymptomsinchinese
AT yinyichao anautomaticapproachforconstructingaknowledgebaseofsymptomsinchinese
AT gaoju anautomaticapproachforconstructingaknowledgebaseofsymptomsinchinese
AT ruantong automaticapproachforconstructingaknowledgebaseofsymptomsinchinese
AT wangmengjie automaticapproachforconstructingaknowledgebaseofsymptomsinchinese
AT sunjian automaticapproachforconstructingaknowledgebaseofsymptomsinchinese
AT wangting automaticapproachforconstructingaknowledgebaseofsymptomsinchinese
AT zenglu automaticapproachforconstructingaknowledgebaseofsymptomsinchinese
AT yinyichao automaticapproachforconstructingaknowledgebaseofsymptomsinchinese
AT gaoju automaticapproachforconstructingaknowledgebaseofsymptomsinchinese