Cargando…

Pediatric Injury Surveillance From Uncoded Emergency Department Admission Records in Italy: Machine Learning–Based Text-Mining Approach

BACKGROUND: Unintentional injury is the leading cause of death in young children. Emergency department (ED) diagnoses are a useful source of information for injury epidemiological surveillance purposes. However, ED data collection systems often use free-text fields to report patient diagnoses. Machi...

Descripción completa

Detalles Bibliográficos
Autores principales: Azzolina, Danila, Bressan, Silvia, Lorenzoni, Giulia, Baldan, Giulia Andrea, Bartolotta, Patrizia, Scognamiglio, Federico, Francavilla, Andrea, Lanera, Corrado, Da Dalt, Liviana, Gregori, Dario
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10372563/
https://www.ncbi.nlm.nih.gov/pubmed/37436799
http://dx.doi.org/10.2196/44467
_version_ 1785078397427253248
author Azzolina, Danila
Bressan, Silvia
Lorenzoni, Giulia
Baldan, Giulia Andrea
Bartolotta, Patrizia
Scognamiglio, Federico
Francavilla, Andrea
Lanera, Corrado
Da Dalt, Liviana
Gregori, Dario
author_facet Azzolina, Danila
Bressan, Silvia
Lorenzoni, Giulia
Baldan, Giulia Andrea
Bartolotta, Patrizia
Scognamiglio, Federico
Francavilla, Andrea
Lanera, Corrado
Da Dalt, Liviana
Gregori, Dario
author_sort Azzolina, Danila
collection PubMed
description BACKGROUND: Unintentional injury is the leading cause of death in young children. Emergency department (ED) diagnoses are a useful source of information for injury epidemiological surveillance purposes. However, ED data collection systems often use free-text fields to report patient diagnoses. Machine learning techniques (MLTs) are powerful tools for automatic text classification. The MLT system is useful to improve injury surveillance by speeding up the manual free-text coding tasks of ED diagnoses. OBJECTIVE: This research aims to develop a tool for automatic free-text classification of ED diagnoses to automatically identify injury cases. The automatic classification system also serves for epidemiological purposes to identify the burden of pediatric injuries in Padua, a large province in the Veneto region in the Northeast Italy. METHODS: The study includes 283,468 pediatric admissions between 2007 and 2018 to the Padova University Hospital ED, a large referral center in Northern Italy. Each record reports a diagnosis by free text. The records are standard tools for reporting patient diagnoses. An expert pediatrician manually classified a randomly extracted sample of approximately 40,000 diagnoses. This study sample served as the gold standard to train an MLT classifier. After preprocessing, a document-term matrix was created. The machine learning classifiers, including decision tree, random forest, gradient boosting method (GBM), and support vector machine (SVM), were tuned by 4-fold cross-validation. The injury diagnoses were classified into 3 hierarchical classification tasks, as follows: injury versus noninjury (task A), intentional versus unintentional injury (task B), and type of unintentional injury (task C), according to the World Health Organization classification of injuries. RESULTS: The SVM classifier achieved the highest performance accuracy (94.14%) in classifying injury versus noninjury cases (task A). The GBM method produced the best results (92% accuracy) for the unintentional and intentional injury classification task (task B). The highest accuracy for the unintentional injury subclassification (task C) was achieved by the SVM classifier. The SVM, random forest, and GBM algorithms performed similarly against the gold standard across different tasks. CONCLUSIONS: This study shows that MLTs are promising techniques for improving epidemiological surveillance, allowing for the automatic classification of pediatric ED free-text diagnoses. The MLTs revealed a suitable classification performance, especially for general injuries and intentional injury classification. This automatic classification could facilitate the epidemiological surveillance of pediatric injuries by also reducing the health professionals’ efforts in manually classifying diagnoses for research purposes.
format Online
Article
Text
id pubmed-10372563
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-103725632023-07-28 Pediatric Injury Surveillance From Uncoded Emergency Department Admission Records in Italy: Machine Learning–Based Text-Mining Approach Azzolina, Danila Bressan, Silvia Lorenzoni, Giulia Baldan, Giulia Andrea Bartolotta, Patrizia Scognamiglio, Federico Francavilla, Andrea Lanera, Corrado Da Dalt, Liviana Gregori, Dario JMIR Public Health Surveill Original Paper BACKGROUND: Unintentional injury is the leading cause of death in young children. Emergency department (ED) diagnoses are a useful source of information for injury epidemiological surveillance purposes. However, ED data collection systems often use free-text fields to report patient diagnoses. Machine learning techniques (MLTs) are powerful tools for automatic text classification. The MLT system is useful to improve injury surveillance by speeding up the manual free-text coding tasks of ED diagnoses. OBJECTIVE: This research aims to develop a tool for automatic free-text classification of ED diagnoses to automatically identify injury cases. The automatic classification system also serves for epidemiological purposes to identify the burden of pediatric injuries in Padua, a large province in the Veneto region in the Northeast Italy. METHODS: The study includes 283,468 pediatric admissions between 2007 and 2018 to the Padova University Hospital ED, a large referral center in Northern Italy. Each record reports a diagnosis by free text. The records are standard tools for reporting patient diagnoses. An expert pediatrician manually classified a randomly extracted sample of approximately 40,000 diagnoses. This study sample served as the gold standard to train an MLT classifier. After preprocessing, a document-term matrix was created. The machine learning classifiers, including decision tree, random forest, gradient boosting method (GBM), and support vector machine (SVM), were tuned by 4-fold cross-validation. The injury diagnoses were classified into 3 hierarchical classification tasks, as follows: injury versus noninjury (task A), intentional versus unintentional injury (task B), and type of unintentional injury (task C), according to the World Health Organization classification of injuries. RESULTS: The SVM classifier achieved the highest performance accuracy (94.14%) in classifying injury versus noninjury cases (task A). The GBM method produced the best results (92% accuracy) for the unintentional and intentional injury classification task (task B). The highest accuracy for the unintentional injury subclassification (task C) was achieved by the SVM classifier. The SVM, random forest, and GBM algorithms performed similarly against the gold standard across different tasks. CONCLUSIONS: This study shows that MLTs are promising techniques for improving epidemiological surveillance, allowing for the automatic classification of pediatric ED free-text diagnoses. The MLTs revealed a suitable classification performance, especially for general injuries and intentional injury classification. This automatic classification could facilitate the epidemiological surveillance of pediatric injuries by also reducing the health professionals’ efforts in manually classifying diagnoses for research purposes. JMIR Publications 2023-07-12 /pmc/articles/PMC10372563/ /pubmed/37436799 http://dx.doi.org/10.2196/44467 Text en ©Danila Azzolina, Silvia Bressan, Giulia Lorenzoni, Giulia Andrea Baldan, Patrizia Bartolotta, Federico Scognamiglio, Andrea Francavilla, Corrado Lanera, Liviana Da Dalt, Dario Gregori. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 12.07.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Azzolina, Danila
Bressan, Silvia
Lorenzoni, Giulia
Baldan, Giulia Andrea
Bartolotta, Patrizia
Scognamiglio, Federico
Francavilla, Andrea
Lanera, Corrado
Da Dalt, Liviana
Gregori, Dario
Pediatric Injury Surveillance From Uncoded Emergency Department Admission Records in Italy: Machine Learning–Based Text-Mining Approach
title Pediatric Injury Surveillance From Uncoded Emergency Department Admission Records in Italy: Machine Learning–Based Text-Mining Approach
title_full Pediatric Injury Surveillance From Uncoded Emergency Department Admission Records in Italy: Machine Learning–Based Text-Mining Approach
title_fullStr Pediatric Injury Surveillance From Uncoded Emergency Department Admission Records in Italy: Machine Learning–Based Text-Mining Approach
title_full_unstemmed Pediatric Injury Surveillance From Uncoded Emergency Department Admission Records in Italy: Machine Learning–Based Text-Mining Approach
title_short Pediatric Injury Surveillance From Uncoded Emergency Department Admission Records in Italy: Machine Learning–Based Text-Mining Approach
title_sort pediatric injury surveillance from uncoded emergency department admission records in italy: machine learning–based text-mining approach
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10372563/
https://www.ncbi.nlm.nih.gov/pubmed/37436799
http://dx.doi.org/10.2196/44467
work_keys_str_mv AT azzolinadanila pediatricinjurysurveillancefromuncodedemergencydepartmentadmissionrecordsinitalymachinelearningbasedtextminingapproach
AT bressansilvia pediatricinjurysurveillancefromuncodedemergencydepartmentadmissionrecordsinitalymachinelearningbasedtextminingapproach
AT lorenzonigiulia pediatricinjurysurveillancefromuncodedemergencydepartmentadmissionrecordsinitalymachinelearningbasedtextminingapproach
AT baldangiuliaandrea pediatricinjurysurveillancefromuncodedemergencydepartmentadmissionrecordsinitalymachinelearningbasedtextminingapproach
AT bartolottapatrizia pediatricinjurysurveillancefromuncodedemergencydepartmentadmissionrecordsinitalymachinelearningbasedtextminingapproach
AT scognamigliofederico pediatricinjurysurveillancefromuncodedemergencydepartmentadmissionrecordsinitalymachinelearningbasedtextminingapproach
AT francavillaandrea pediatricinjurysurveillancefromuncodedemergencydepartmentadmissionrecordsinitalymachinelearningbasedtextminingapproach
AT laneracorrado pediatricinjurysurveillancefromuncodedemergencydepartmentadmissionrecordsinitalymachinelearningbasedtextminingapproach
AT dadaltliviana pediatricinjurysurveillancefromuncodedemergencydepartmentadmissionrecordsinitalymachinelearningbasedtextminingapproach
AT gregoridario pediatricinjurysurveillancefromuncodedemergencydepartmentadmissionrecordsinitalymachinelearningbasedtextminingapproach