Cargando…

RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records

BACKGROUND: The ovarian reserve is a reservoir for reproductive potential. In clinical practice, early detection and treatment of premature ovarian decline characterized by abnormal ovarian reserve tests is regarded as a critical measure to prevent infertility. However, the relevant data are typical...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cai, Jie, Chen, Shenglin, Guo, Siyun, Wang, Suidong, Li, Lintong, Liu, Xiaotong, Zheng, Keming, Liu, Yudong, Chen, Shiling
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10353087/ https://www.ncbi.nlm.nih.gov/pubmed/37464410 http://dx.doi.org/10.1186/s12911-023-02239-8

_version_	1785074645188214784
author	Cai, Jie Chen, Shenglin Guo, Siyun Wang, Suidong Li, Lintong Liu, Xiaotong Zheng, Keming Liu, Yudong Chen, Shiling
author_facet	Cai, Jie Chen, Shenglin Guo, Siyun Wang, Suidong Li, Lintong Liu, Xiaotong Zheng, Keming Liu, Yudong Chen, Shiling
author_sort	Cai, Jie
collection	PubMed
description	BACKGROUND: The ovarian reserve is a reservoir for reproductive potential. In clinical practice, early detection and treatment of premature ovarian decline characterized by abnormal ovarian reserve tests is regarded as a critical measure to prevent infertility. However, the relevant data are typically stored in an unstructured format in a hospital’s electronic medical record (EMR) system, and their retrieval requires tedious manual abstraction by domain experts. Computational tools are therefore needed to reduce the workload. METHODS: We presented RegEMR, an artificial intelligence tool composed of a rule-based natural language processing (NLP) extractor and a knowledge-based disease scoring model, to automatize the screening procedure of premature ovarian decline using Chinese reproductive EMRs. We used regular expressions (REs) as a text mining method and explored whether REs automatically synthesized by the genetic programming-based online platform RegexGenerator + + could be as effective as manually formulated REs. We also investigated how the representativeness of the learning corpus affected the performance of machine-generated REs. Additionally, we translated the clinical diagnostic criteria into a programmable disease diagnostic model for disease scoring and risk stratification. Four hundred outpatient medical records were collected from a Chinese fertility center. Manual review served as the gold standard, and fivefold cross-validation was used for evaluation. RESULTS: The overall F-score of manually built REs was 0.9444 (95% CI 0.9373 to 0.9515), with no significant difference (paired t test p > 0.05) compared with machine-generated REs that could be affected by training set sizes and annotation portions. The extractor performed effectively in automatically tracing the dynamic changes in hormone levels (F-score 0.9518–0.9884) and ultrasonographic measures (F-score 0.9472–0.9822). Applying the extracted information to the proposed diagnostic model, the program obtained an accuracy of 0.98 and a sensitivity of 0.93 in risk screening. For each specific disease, the automatic diagnosis in 76% of patients was consistent with that of the clinical diagnosis, and the kappa coefficient was 0.63. CONCLUSION: A Chinese NLP system named RegEMR was developed to automatically identify high risk of early ovarian aging and diagnose related diseases from Chinese reproductive EMRs. We hope that this system can aid EMR-based data collection and clinical decision support in fertility centers. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02239-8.
format	Online Article Text
id	pubmed-10353087
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-103530872023-07-19 RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records Cai, Jie Chen, Shenglin Guo, Siyun Wang, Suidong Li, Lintong Liu, Xiaotong Zheng, Keming Liu, Yudong Chen, Shiling BMC Med Inform Decis Mak Research BACKGROUND: The ovarian reserve is a reservoir for reproductive potential. In clinical practice, early detection and treatment of premature ovarian decline characterized by abnormal ovarian reserve tests is regarded as a critical measure to prevent infertility. However, the relevant data are typically stored in an unstructured format in a hospital’s electronic medical record (EMR) system, and their retrieval requires tedious manual abstraction by domain experts. Computational tools are therefore needed to reduce the workload. METHODS: We presented RegEMR, an artificial intelligence tool composed of a rule-based natural language processing (NLP) extractor and a knowledge-based disease scoring model, to automatize the screening procedure of premature ovarian decline using Chinese reproductive EMRs. We used regular expressions (REs) as a text mining method and explored whether REs automatically synthesized by the genetic programming-based online platform RegexGenerator + + could be as effective as manually formulated REs. We also investigated how the representativeness of the learning corpus affected the performance of machine-generated REs. Additionally, we translated the clinical diagnostic criteria into a programmable disease diagnostic model for disease scoring and risk stratification. Four hundred outpatient medical records were collected from a Chinese fertility center. Manual review served as the gold standard, and fivefold cross-validation was used for evaluation. RESULTS: The overall F-score of manually built REs was 0.9444 (95% CI 0.9373 to 0.9515), with no significant difference (paired t test p > 0.05) compared with machine-generated REs that could be affected by training set sizes and annotation portions. The extractor performed effectively in automatically tracing the dynamic changes in hormone levels (F-score 0.9518–0.9884) and ultrasonographic measures (F-score 0.9472–0.9822). Applying the extracted information to the proposed diagnostic model, the program obtained an accuracy of 0.98 and a sensitivity of 0.93 in risk screening. For each specific disease, the automatic diagnosis in 76% of patients was consistent with that of the clinical diagnosis, and the kappa coefficient was 0.63. CONCLUSION: A Chinese NLP system named RegEMR was developed to automatically identify high risk of early ovarian aging and diagnose related diseases from Chinese reproductive EMRs. We hope that this system can aid EMR-based data collection and clinical decision support in fertility centers. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02239-8. BioMed Central 2023-07-18 /pmc/articles/PMC10353087/ /pubmed/37464410 http://dx.doi.org/10.1186/s12911-023-02239-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Cai, Jie Chen, Shenglin Guo, Siyun Wang, Suidong Li, Lintong Liu, Xiaotong Zheng, Keming Liu, Yudong Chen, Shiling RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records
title	RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records
title_full	RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records
title_fullStr	RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records
title_full_unstemmed	RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records
title_short	RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records
title_sort	regemr: a natural language processing system to automatically identify premature ovarian decline from chinese electronic medical records
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10353087/ https://www.ncbi.nlm.nih.gov/pubmed/37464410 http://dx.doi.org/10.1186/s12911-023-02239-8
work_keys_str_mv	AT caijie regemranaturallanguageprocessingsystemtoautomaticallyidentifyprematureovariandeclinefromchineseelectronicmedicalrecords AT chenshenglin regemranaturallanguageprocessingsystemtoautomaticallyidentifyprematureovariandeclinefromchineseelectronicmedicalrecords AT guosiyun regemranaturallanguageprocessingsystemtoautomaticallyidentifyprematureovariandeclinefromchineseelectronicmedicalrecords AT wangsuidong regemranaturallanguageprocessingsystemtoautomaticallyidentifyprematureovariandeclinefromchineseelectronicmedicalrecords AT lilintong regemranaturallanguageprocessingsystemtoautomaticallyidentifyprematureovariandeclinefromchineseelectronicmedicalrecords AT liuxiaotong regemranaturallanguageprocessingsystemtoautomaticallyidentifyprematureovariandeclinefromchineseelectronicmedicalrecords AT zhengkeming regemranaturallanguageprocessingsystemtoautomaticallyidentifyprematureovariandeclinefromchineseelectronicmedicalrecords AT liuyudong regemranaturallanguageprocessingsystemtoautomaticallyidentifyprematureovariandeclinefromchineseelectronicmedicalrecords AT chenshiling regemranaturallanguageprocessingsystemtoautomaticallyidentifyprematureovariandeclinefromchineseelectronicmedicalrecords

RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records

Ejemplares similares