Cargando…
An automatic approach for constructing a knowledge base of symptoms in Chinese
BACKGROUND: While a large number of well-known knowledge bases (KBs) in life science have been published as Linked Open Data, there are few KBs in Chinese. However, KBs in Chinese are necessary when we want to automatically process and analyze electronic medical records (EMRs) in Chinese. Of all, th...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5763289/ https://www.ncbi.nlm.nih.gov/pubmed/29297414 http://dx.doi.org/10.1186/s13326-017-0145-x |
_version_ | 1783291854452686848 |
---|---|
author | Ruan, Tong Wang, Mengjie Sun, Jian Wang, Ting Zeng, Lu Yin, Yichao Gao, Ju |
author_facet | Ruan, Tong Wang, Mengjie Sun, Jian Wang, Ting Zeng, Lu Yin, Yichao Gao, Ju |
author_sort | Ruan, Tong |
collection | PubMed |
description | BACKGROUND: While a large number of well-known knowledge bases (KBs) in life science have been published as Linked Open Data, there are few KBs in Chinese. However, KBs in Chinese are necessary when we want to automatically process and analyze electronic medical records (EMRs) in Chinese. Of all, the symptom KB in Chinese is the most seriously in need, since symptoms are the starting point of clinical diagnosis. RESULTS: We publish a public KB of symptoms in Chinese, including symptoms, departments, diseases, medicines, and examinations as well as relations between symptoms and the above related entities. To the best of our knowledge, there is no such KB focusing on symptoms in Chinese, and the KB is an important supplement to existing medical resources. Our KB is constructed by fusing data automatically extracted from eight mainstream healthcare websites, three Chinese encyclopedia sites, and symptoms extracted from a larger number of EMRs as supplements. METHODS: Firstly, we design data schema manually by reference to the Unified Medical Language System (UMLS). Secondly, we extract entities from eight mainstream healthcare websites, which are fed as seeds to train a multi-class classifier and classify entities from encyclopedia sites and train a Conditional Random Field (CRF) model to extract symptoms from EMRs. Thirdly, we fuse data to solve the large-scale duplication between different data sources according to entity type alignment, entity mapping, and attribute mapping. Finally, we link our KB to UMLS to investigate similarities and differences between symptoms in Chinese and English. CONCLUSIONS: As a result, the KB has more than 26,000 distinct symptoms in Chinese including 3968 symptoms in traditional Chinese medicine and 1029 synonym pairs for symptoms. The KB also includes concepts such as diseases and medicines as well as relations between symptoms and the above related entities. We also link our KB to the Unified Medical Language System and analyze the differences between symptoms in the two KBs. We released the KB as Linked Open Data and a demo at https://datahub.io/dataset/symptoms-in-chinese. |
format | Online Article Text |
id | pubmed-5763289 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57632892018-01-17 An automatic approach for constructing a knowledge base of symptoms in Chinese Ruan, Tong Wang, Mengjie Sun, Jian Wang, Ting Zeng, Lu Yin, Yichao Gao, Ju J Biomed Semantics Research BACKGROUND: While a large number of well-known knowledge bases (KBs) in life science have been published as Linked Open Data, there are few KBs in Chinese. However, KBs in Chinese are necessary when we want to automatically process and analyze electronic medical records (EMRs) in Chinese. Of all, the symptom KB in Chinese is the most seriously in need, since symptoms are the starting point of clinical diagnosis. RESULTS: We publish a public KB of symptoms in Chinese, including symptoms, departments, diseases, medicines, and examinations as well as relations between symptoms and the above related entities. To the best of our knowledge, there is no such KB focusing on symptoms in Chinese, and the KB is an important supplement to existing medical resources. Our KB is constructed by fusing data automatically extracted from eight mainstream healthcare websites, three Chinese encyclopedia sites, and symptoms extracted from a larger number of EMRs as supplements. METHODS: Firstly, we design data schema manually by reference to the Unified Medical Language System (UMLS). Secondly, we extract entities from eight mainstream healthcare websites, which are fed as seeds to train a multi-class classifier and classify entities from encyclopedia sites and train a Conditional Random Field (CRF) model to extract symptoms from EMRs. Thirdly, we fuse data to solve the large-scale duplication between different data sources according to entity type alignment, entity mapping, and attribute mapping. Finally, we link our KB to UMLS to investigate similarities and differences between symptoms in Chinese and English. CONCLUSIONS: As a result, the KB has more than 26,000 distinct symptoms in Chinese including 3968 symptoms in traditional Chinese medicine and 1029 synonym pairs for symptoms. The KB also includes concepts such as diseases and medicines as well as relations between symptoms and the above related entities. We also link our KB to the Unified Medical Language System and analyze the differences between symptoms in the two KBs. We released the KB as Linked Open Data and a demo at https://datahub.io/dataset/symptoms-in-chinese. BioMed Central 2017-09-20 /pmc/articles/PMC5763289/ /pubmed/29297414 http://dx.doi.org/10.1186/s13326-017-0145-x Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Ruan, Tong Wang, Mengjie Sun, Jian Wang, Ting Zeng, Lu Yin, Yichao Gao, Ju An automatic approach for constructing a knowledge base of symptoms in Chinese |
title | An automatic approach for constructing a knowledge base of symptoms in Chinese |
title_full | An automatic approach for constructing a knowledge base of symptoms in Chinese |
title_fullStr | An automatic approach for constructing a knowledge base of symptoms in Chinese |
title_full_unstemmed | An automatic approach for constructing a knowledge base of symptoms in Chinese |
title_short | An automatic approach for constructing a knowledge base of symptoms in Chinese |
title_sort | automatic approach for constructing a knowledge base of symptoms in chinese |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5763289/ https://www.ncbi.nlm.nih.gov/pubmed/29297414 http://dx.doi.org/10.1186/s13326-017-0145-x |
work_keys_str_mv | AT ruantong anautomaticapproachforconstructingaknowledgebaseofsymptomsinchinese AT wangmengjie anautomaticapproachforconstructingaknowledgebaseofsymptomsinchinese AT sunjian anautomaticapproachforconstructingaknowledgebaseofsymptomsinchinese AT wangting anautomaticapproachforconstructingaknowledgebaseofsymptomsinchinese AT zenglu anautomaticapproachforconstructingaknowledgebaseofsymptomsinchinese AT yinyichao anautomaticapproachforconstructingaknowledgebaseofsymptomsinchinese AT gaoju anautomaticapproachforconstructingaknowledgebaseofsymptomsinchinese AT ruantong automaticapproachforconstructingaknowledgebaseofsymptomsinchinese AT wangmengjie automaticapproachforconstructingaknowledgebaseofsymptomsinchinese AT sunjian automaticapproachforconstructingaknowledgebaseofsymptomsinchinese AT wangting automaticapproachforconstructingaknowledgebaseofsymptomsinchinese AT zenglu automaticapproachforconstructingaknowledgebaseofsymptomsinchinese AT yinyichao automaticapproachforconstructingaknowledgebaseofsymptomsinchinese AT gaoju automaticapproachforconstructingaknowledgebaseofsymptomsinchinese |