Cargando…
Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine
BACKGROUND: In this study, we focus on building a fine-grained entity annotation corpus with the corresponding annotation guideline of traditional Chinese medicine (TCM) clinical records. Our aim is to provide a basis for the fine-grained corpus construction of TCM clinical records in future. METHOD...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7132896/ https://www.ncbi.nlm.nih.gov/pubmed/32252745 http://dx.doi.org/10.1186/s12911-020-1079-2 |
_version_ | 1783517527293296640 |
---|---|
author | Zhang, Tingting Wang, Yaqiang Wang, Xiaofeng Yang, Yafei Ye, Ying |
author_facet | Zhang, Tingting Wang, Yaqiang Wang, Xiaofeng Yang, Yafei Ye, Ying |
author_sort | Zhang, Tingting |
collection | PubMed |
description | BACKGROUND: In this study, we focus on building a fine-grained entity annotation corpus with the corresponding annotation guideline of traditional Chinese medicine (TCM) clinical records. Our aim is to provide a basis for the fine-grained corpus construction of TCM clinical records in future. METHODS: We developed a four-step approach that is suitable for the construction of TCM medical records in our corpus. First, we determined the entity types included in this study through sample annotation. Then, we drafted a fine-grained annotation guideline by summarizing the characteristics of the dataset and referring to some existing guidelines. We iteratively updated the guidelines until the inter-annotator agreement (IAA) exceeded a Cohen’s kappa value of 0.9. Comprehensive annotations were performed while keeping the IAA value above 0.9. RESULTS: We annotated the 10,197 clinical records in five rounds. Four entity categories involving 13 entity types were employed. The final fine-grained annotated entity corpus consists of 1104 entities and 67,799 tokens. The final IAAs are 0.936 on average (for three annotators), indicating that the fine-grained entity recognition corpus is of high quality. CONCLUSIONS: These results will provide a foundation for future research on corpus construction and named entity recognition tasks in the TCM clinical domain. |
format | Online Article Text |
id | pubmed-7132896 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-71328962020-04-11 Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine Zhang, Tingting Wang, Yaqiang Wang, Xiaofeng Yang, Yafei Ye, Ying BMC Med Inform Decis Mak Research Article BACKGROUND: In this study, we focus on building a fine-grained entity annotation corpus with the corresponding annotation guideline of traditional Chinese medicine (TCM) clinical records. Our aim is to provide a basis for the fine-grained corpus construction of TCM clinical records in future. METHODS: We developed a four-step approach that is suitable for the construction of TCM medical records in our corpus. First, we determined the entity types included in this study through sample annotation. Then, we drafted a fine-grained annotation guideline by summarizing the characteristics of the dataset and referring to some existing guidelines. We iteratively updated the guidelines until the inter-annotator agreement (IAA) exceeded a Cohen’s kappa value of 0.9. Comprehensive annotations were performed while keeping the IAA value above 0.9. RESULTS: We annotated the 10,197 clinical records in five rounds. Four entity categories involving 13 entity types were employed. The final fine-grained annotated entity corpus consists of 1104 entities and 67,799 tokens. The final IAAs are 0.936 on average (for three annotators), indicating that the fine-grained entity recognition corpus is of high quality. CONCLUSIONS: These results will provide a foundation for future research on corpus construction and named entity recognition tasks in the TCM clinical domain. BioMed Central 2020-04-06 /pmc/articles/PMC7132896/ /pubmed/32252745 http://dx.doi.org/10.1186/s12911-020-1079-2 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Zhang, Tingting Wang, Yaqiang Wang, Xiaofeng Yang, Yafei Ye, Ying Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine |
title | Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine |
title_full | Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine |
title_fullStr | Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine |
title_full_unstemmed | Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine |
title_short | Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine |
title_sort | constructing fine-grained entity recognition corpora based on clinical records of traditional chinese medicine |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7132896/ https://www.ncbi.nlm.nih.gov/pubmed/32252745 http://dx.doi.org/10.1186/s12911-020-1079-2 |
work_keys_str_mv | AT zhangtingting constructingfinegrainedentityrecognitioncorporabasedonclinicalrecordsoftraditionalchinesemedicine AT wangyaqiang constructingfinegrainedentityrecognitioncorporabasedonclinicalrecordsoftraditionalchinesemedicine AT wangxiaofeng constructingfinegrainedentityrecognitioncorporabasedonclinicalrecordsoftraditionalchinesemedicine AT yangyafei constructingfinegrainedentityrecognitioncorporabasedonclinicalrecordsoftraditionalchinesemedicine AT yeying constructingfinegrainedentityrecognitioncorporabasedonclinicalrecordsoftraditionalchinesemedicine |