Cargando…
Clinical assistant decision-making model of tuberculosis based on electronic health records
BACKGROUND: Tuberculosis is a dangerous infectious disease with the largest number of reported cases in China every year. Preventing missed diagnosis has an important impact on the prevention, treatment, and recovery of tuberculosis. The earliest pulmonary tuberculosis prediction models mainly used...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10022184/ https://www.ncbi.nlm.nih.gov/pubmed/36927471 http://dx.doi.org/10.1186/s13040-023-00328-y |
_version_ | 1784908673896677376 |
---|---|
author | Wang, Mengying Lee, Cuixia Wei, Zhenhao Ji, Hong Yang, Yingyun Yang, Cheng |
author_facet | Wang, Mengying Lee, Cuixia Wei, Zhenhao Ji, Hong Yang, Yingyun Yang, Cheng |
author_sort | Wang, Mengying |
collection | PubMed |
description | BACKGROUND: Tuberculosis is a dangerous infectious disease with the largest number of reported cases in China every year. Preventing missed diagnosis has an important impact on the prevention, treatment, and recovery of tuberculosis. The earliest pulmonary tuberculosis prediction models mainly used traditional image data combined with neural network models. However, a single data source tends to miss important information, such as primary symptoms and laboratory test results, that is available in multi-source data like medical records and tests. In this study, we propose a multi-stream integrated pulmonary tuberculosis diagnosis model based on structured and unstructured multi-source data from electronic health records. With the limited number of lung specialists and the high prevalence of tuberculosis, the application of this auxiliary diagnosis model can make substantial contributions to clinical settings. METHODS: The subjects were patients at the respiratory department and infectious cases department of a large comprehensive hospital in China between 2015 to 2020. A total of 95,294 medical records were selected through a quality control process. Each record contains structured and unstructured data. First, numerical expressions of features for structured data were created. Then, feature engineering was performed through decision tree model, random forest, and GBDT. Features were included in the feature exclusion set as per their weights in descending order. When the importance of the set was higher than 0.7, this process was concluded. Finally, the contained features were used for model training. In addition, the unstructured free-text data was segmented at the character level and input into the model after indexing. Tuberculosis prediction was conducted through a multi-stream integration tuberculosis diagnosis model (MSI-PTDM), and the evaluation indices of accuracy, AUC, sensitivity, and specificity were compared against the prediction results of XGBoost, Text-CNN, Random Forest, SVM, and so on. RESULTS: Through a variety of characteristic engineering methods, 20 characteristic factors, such as main complaint hemoptysis, cough, and test erythrocyte sedimentation rate, were selected, and the influencing factors were analyzed using the Chinese diagnostic standard of pulmonary tuberculosis. The area under the curve values for MSI-PTDM, XGBoost, Text-CNN, RF, and SVM were 0.9858, 0.9571, 0.9486, 0.9428, and 0.9429, respectively. The sensitivity, specificity, and accuracy of MSI-PTDM were 93.18%, 96.96%, and 96.96%, respectively. The MSI-PTDM prediction model was installed at a doctor workstation and operated in a real clinic environment for 4 months. A total of 692,949 patients were monitored, including 484 patients with confirmed pulmonary tuberculosis. The model predicted 440 cases of pulmonary tuberculosis. The positive sample recognition rate was 90.91%, the false-positive rate was 9.09%, the negative sample recognition rate was 96.17%, and the false-negative rate was 3.83%. CONCLUSIONS: MSI-PTDM can process sparse data, dense data, and unstructured text data concurrently. The model adds a feature domain vector embedding the medical sparse features, and the single-valued sparse vectors are represented by multi-dimensional dense hidden vectors, which not only enhances the feature expression but also alleviates the side effects of sparsity on the model training. However, there may be information loss when features are extracted from text, and adding the processing of original unstructured text makes up for the error within the above process to a certain extent, so that the model can learn data more comprehensively and effectively. In addition, MSI-PTDM also allows interaction between features, considers the combination effect between patient features, adds more complex nonlinear calculation considerations, and improves the learning ability of the model. It has been verified using a test set and via deployment within an actual outpatient environment. |
format | Online Article Text |
id | pubmed-10022184 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-100221842023-03-18 Clinical assistant decision-making model of tuberculosis based on electronic health records Wang, Mengying Lee, Cuixia Wei, Zhenhao Ji, Hong Yang, Yingyun Yang, Cheng BioData Min Research BACKGROUND: Tuberculosis is a dangerous infectious disease with the largest number of reported cases in China every year. Preventing missed diagnosis has an important impact on the prevention, treatment, and recovery of tuberculosis. The earliest pulmonary tuberculosis prediction models mainly used traditional image data combined with neural network models. However, a single data source tends to miss important information, such as primary symptoms and laboratory test results, that is available in multi-source data like medical records and tests. In this study, we propose a multi-stream integrated pulmonary tuberculosis diagnosis model based on structured and unstructured multi-source data from electronic health records. With the limited number of lung specialists and the high prevalence of tuberculosis, the application of this auxiliary diagnosis model can make substantial contributions to clinical settings. METHODS: The subjects were patients at the respiratory department and infectious cases department of a large comprehensive hospital in China between 2015 to 2020. A total of 95,294 medical records were selected through a quality control process. Each record contains structured and unstructured data. First, numerical expressions of features for structured data were created. Then, feature engineering was performed through decision tree model, random forest, and GBDT. Features were included in the feature exclusion set as per their weights in descending order. When the importance of the set was higher than 0.7, this process was concluded. Finally, the contained features were used for model training. In addition, the unstructured free-text data was segmented at the character level and input into the model after indexing. Tuberculosis prediction was conducted through a multi-stream integration tuberculosis diagnosis model (MSI-PTDM), and the evaluation indices of accuracy, AUC, sensitivity, and specificity were compared against the prediction results of XGBoost, Text-CNN, Random Forest, SVM, and so on. RESULTS: Through a variety of characteristic engineering methods, 20 characteristic factors, such as main complaint hemoptysis, cough, and test erythrocyte sedimentation rate, were selected, and the influencing factors were analyzed using the Chinese diagnostic standard of pulmonary tuberculosis. The area under the curve values for MSI-PTDM, XGBoost, Text-CNN, RF, and SVM were 0.9858, 0.9571, 0.9486, 0.9428, and 0.9429, respectively. The sensitivity, specificity, and accuracy of MSI-PTDM were 93.18%, 96.96%, and 96.96%, respectively. The MSI-PTDM prediction model was installed at a doctor workstation and operated in a real clinic environment for 4 months. A total of 692,949 patients were monitored, including 484 patients with confirmed pulmonary tuberculosis. The model predicted 440 cases of pulmonary tuberculosis. The positive sample recognition rate was 90.91%, the false-positive rate was 9.09%, the negative sample recognition rate was 96.17%, and the false-negative rate was 3.83%. CONCLUSIONS: MSI-PTDM can process sparse data, dense data, and unstructured text data concurrently. The model adds a feature domain vector embedding the medical sparse features, and the single-valued sparse vectors are represented by multi-dimensional dense hidden vectors, which not only enhances the feature expression but also alleviates the side effects of sparsity on the model training. However, there may be information loss when features are extracted from text, and adding the processing of original unstructured text makes up for the error within the above process to a certain extent, so that the model can learn data more comprehensively and effectively. In addition, MSI-PTDM also allows interaction between features, considers the combination effect between patient features, adds more complex nonlinear calculation considerations, and improves the learning ability of the model. It has been verified using a test set and via deployment within an actual outpatient environment. BioMed Central 2023-03-16 /pmc/articles/PMC10022184/ /pubmed/36927471 http://dx.doi.org/10.1186/s13040-023-00328-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Wang, Mengying Lee, Cuixia Wei, Zhenhao Ji, Hong Yang, Yingyun Yang, Cheng Clinical assistant decision-making model of tuberculosis based on electronic health records |
title | Clinical assistant decision-making model of tuberculosis based on electronic health records |
title_full | Clinical assistant decision-making model of tuberculosis based on electronic health records |
title_fullStr | Clinical assistant decision-making model of tuberculosis based on electronic health records |
title_full_unstemmed | Clinical assistant decision-making model of tuberculosis based on electronic health records |
title_short | Clinical assistant decision-making model of tuberculosis based on electronic health records |
title_sort | clinical assistant decision-making model of tuberculosis based on electronic health records |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10022184/ https://www.ncbi.nlm.nih.gov/pubmed/36927471 http://dx.doi.org/10.1186/s13040-023-00328-y |
work_keys_str_mv | AT wangmengying clinicalassistantdecisionmakingmodeloftuberculosisbasedonelectronichealthrecords AT leecuixia clinicalassistantdecisionmakingmodeloftuberculosisbasedonelectronichealthrecords AT weizhenhao clinicalassistantdecisionmakingmodeloftuberculosisbasedonelectronichealthrecords AT jihong clinicalassistantdecisionmakingmodeloftuberculosisbasedonelectronichealthrecords AT yangyingyun clinicalassistantdecisionmakingmodeloftuberculosisbasedonelectronichealthrecords AT yangcheng clinicalassistantdecisionmakingmodeloftuberculosisbasedonelectronichealthrecords |