Cargando…
Pneumonia and Pulmonary Thromboembolism Classification Using Electronic Health Records
Pneumonia and pulmonary thromboembolism (PTE) are both respiratory diseases; their diagnosis is difficult due to their similarity in symptoms, medical subjectivity, and the large amount of information from different sources necessary for a correct diagnosis. Analysis of such clinical data using comp...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9601338/ https://www.ncbi.nlm.nih.gov/pubmed/36292225 http://dx.doi.org/10.3390/diagnostics12102536 |
_version_ | 1784817039541534720 |
---|---|
author | Siordia-Millán, Sinhue Torres-Ramos, Sulema Salido-Ruiz, Ricardo A. Hernández-Gordillo, Daniel Pérez-Gutiérrez, Tracy Román-Godínez, Israel |
author_facet | Siordia-Millán, Sinhue Torres-Ramos, Sulema Salido-Ruiz, Ricardo A. Hernández-Gordillo, Daniel Pérez-Gutiérrez, Tracy Román-Godínez, Israel |
author_sort | Siordia-Millán, Sinhue |
collection | PubMed |
description | Pneumonia and pulmonary thromboembolism (PTE) are both respiratory diseases; their diagnosis is difficult due to their similarity in symptoms, medical subjectivity, and the large amount of information from different sources necessary for a correct diagnosis. Analysis of such clinical data using computational tools could help medical staff reduce time, increase diagnostic certainty, and improve patient care during hospitalization. In addition, no studies have been found that analyze all clinical information on the Mexican population in the Spanish language. Therefore, this work performs automatic diagnosis of pneumonia and pulmonary thromboembolism using machine-learning tools along with clinical laboratory information (structured data) and clinical text (unstructured data) obtained from electronic health records. A cohort of 173 clinical records was obtained from the Mexican Social Security Institute. The data were preprocessed, transformed, and adjusted to be analyzed using several machine-learning algorithms. For structured data, naïve Bayes, support vector machine, decision trees, AdaBoost, random forest, and multilayer perceptron were used; for unstructured data, a BiLSTM was used. K-fold cross-validation and leave-one-out were used for evaluation of structured data, and hold-out was used for unstructured data; additionally, 1-vs.-1 and 1-vs.-rest approaches were used. Structured data results show that the highest AUC-ROC was achieved by the naïve Bayes algorithm classifying PTE vs. pneumonia (87.0%), PTE vs. control (75.1%), and pneumonia vs. control (85.2%) with the 1-vs.-1 approach; for the 1-vs.-rest approach, the best performance was reported in pneumonia vs. rest (86.3%) and PTE vs. rest (79.7%) using naïve Bayes, and control vs. diseases (79.8%) using decision trees. Regarding unstructured data, the results do not present a good AUC-ROC; however, the best F1-score were scored for control vs. disease (72.7%) in the 1-vs.-rest approach and control vs. pneumonia (63.6%) in the 1-to-1 approach. Additionally, several decision trees were obtained to identify important attributes for automatic diagnosis for structured data, particularly for PTE vs. pneumonia. Based on the experiments, the structured datasets present the highest values. Results suggest using naïve Bayes and structured data to automatically diagnose PTE vs. pneumonia. Moreover, using decision trees allows the observation of some decision criteria that the medical staff could consider for diagnosis. |
format | Online Article Text |
id | pubmed-9601338 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-96013382022-10-27 Pneumonia and Pulmonary Thromboembolism Classification Using Electronic Health Records Siordia-Millán, Sinhue Torres-Ramos, Sulema Salido-Ruiz, Ricardo A. Hernández-Gordillo, Daniel Pérez-Gutiérrez, Tracy Román-Godínez, Israel Diagnostics (Basel) Article Pneumonia and pulmonary thromboembolism (PTE) are both respiratory diseases; their diagnosis is difficult due to their similarity in symptoms, medical subjectivity, and the large amount of information from different sources necessary for a correct diagnosis. Analysis of such clinical data using computational tools could help medical staff reduce time, increase diagnostic certainty, and improve patient care during hospitalization. In addition, no studies have been found that analyze all clinical information on the Mexican population in the Spanish language. Therefore, this work performs automatic diagnosis of pneumonia and pulmonary thromboembolism using machine-learning tools along with clinical laboratory information (structured data) and clinical text (unstructured data) obtained from electronic health records. A cohort of 173 clinical records was obtained from the Mexican Social Security Institute. The data were preprocessed, transformed, and adjusted to be analyzed using several machine-learning algorithms. For structured data, naïve Bayes, support vector machine, decision trees, AdaBoost, random forest, and multilayer perceptron were used; for unstructured data, a BiLSTM was used. K-fold cross-validation and leave-one-out were used for evaluation of structured data, and hold-out was used for unstructured data; additionally, 1-vs.-1 and 1-vs.-rest approaches were used. Structured data results show that the highest AUC-ROC was achieved by the naïve Bayes algorithm classifying PTE vs. pneumonia (87.0%), PTE vs. control (75.1%), and pneumonia vs. control (85.2%) with the 1-vs.-1 approach; for the 1-vs.-rest approach, the best performance was reported in pneumonia vs. rest (86.3%) and PTE vs. rest (79.7%) using naïve Bayes, and control vs. diseases (79.8%) using decision trees. Regarding unstructured data, the results do not present a good AUC-ROC; however, the best F1-score were scored for control vs. disease (72.7%) in the 1-vs.-rest approach and control vs. pneumonia (63.6%) in the 1-to-1 approach. Additionally, several decision trees were obtained to identify important attributes for automatic diagnosis for structured data, particularly for PTE vs. pneumonia. Based on the experiments, the structured datasets present the highest values. Results suggest using naïve Bayes and structured data to automatically diagnose PTE vs. pneumonia. Moreover, using decision trees allows the observation of some decision criteria that the medical staff could consider for diagnosis. MDPI 2022-10-19 /pmc/articles/PMC9601338/ /pubmed/36292225 http://dx.doi.org/10.3390/diagnostics12102536 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Siordia-Millán, Sinhue Torres-Ramos, Sulema Salido-Ruiz, Ricardo A. Hernández-Gordillo, Daniel Pérez-Gutiérrez, Tracy Román-Godínez, Israel Pneumonia and Pulmonary Thromboembolism Classification Using Electronic Health Records |
title | Pneumonia and Pulmonary Thromboembolism Classification Using Electronic Health Records |
title_full | Pneumonia and Pulmonary Thromboembolism Classification Using Electronic Health Records |
title_fullStr | Pneumonia and Pulmonary Thromboembolism Classification Using Electronic Health Records |
title_full_unstemmed | Pneumonia and Pulmonary Thromboembolism Classification Using Electronic Health Records |
title_short | Pneumonia and Pulmonary Thromboembolism Classification Using Electronic Health Records |
title_sort | pneumonia and pulmonary thromboembolism classification using electronic health records |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9601338/ https://www.ncbi.nlm.nih.gov/pubmed/36292225 http://dx.doi.org/10.3390/diagnostics12102536 |
work_keys_str_mv | AT siordiamillansinhue pneumoniaandpulmonarythromboembolismclassificationusingelectronichealthrecords AT torresramossulema pneumoniaandpulmonarythromboembolismclassificationusingelectronichealthrecords AT salidoruizricardoa pneumoniaandpulmonarythromboembolismclassificationusingelectronichealthrecords AT hernandezgordillodaniel pneumoniaandpulmonarythromboembolismclassificationusingelectronichealthrecords AT perezgutierreztracy pneumoniaandpulmonarythromboembolismclassificationusingelectronichealthrecords AT romangodinezisrael pneumoniaandpulmonarythromboembolismclassificationusingelectronichealthrecords |