Cargando…

Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study

BACKGROUND: The detection of infectious diseases through the analysis of free text on electronic health reports (EHRs) can provide prompt and accurate background information for the implementation of preventative measures, such as advertising and monitoring the effectiveness of vaccination campaigns...

Descripción completa

Detalles Bibliográficos
Autores principales: Lanera, Corrado, Berchialla, Paola, Baldi, Ileana, Lorenzoni, Giulia, Tramontan, Lara, Scamarcia, Antonio, Cantarutti, Luigi, Giaquinto, Carlo, Gregori, Dario
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7238079/
https://www.ncbi.nlm.nih.gov/pubmed/32369038
http://dx.doi.org/10.2196/14330
_version_ 1783536460956172288
author Lanera, Corrado
Berchialla, Paola
Baldi, Ileana
Lorenzoni, Giulia
Tramontan, Lara
Scamarcia, Antonio
Cantarutti, Luigi
Giaquinto, Carlo
Gregori, Dario
author_facet Lanera, Corrado
Berchialla, Paola
Baldi, Ileana
Lorenzoni, Giulia
Tramontan, Lara
Scamarcia, Antonio
Cantarutti, Luigi
Giaquinto, Carlo
Gregori, Dario
author_sort Lanera, Corrado
collection PubMed
description BACKGROUND: The detection of infectious diseases through the analysis of free text on electronic health reports (EHRs) can provide prompt and accurate background information for the implementation of preventative measures, such as advertising and monitoring the effectiveness of vaccination campaigns. OBJECTIVE: The purpose of this paper is to compare machine learning techniques in their application to EHR analysis for disease detection. METHODS: The Pedianet database was used as a data source for a real-world scenario on the identification of cases of varicella. The models’ training and test sets were based on two different Italian regions’ (Veneto and Sicilia) data sets of 7631 patients and 1,230,355 records, and 2347 patients and 569,926 records, respectively, for whom a gold standard of varicella diagnosis was available. Elastic-net regularized generalized linear model (GLMNet), maximum entropy (MAXENT), and LogitBoost (boosting) algorithms were implemented in a supervised environment and 5-fold cross-validated. The document-term matrix generated by the training set involves a dictionary of 1,871,532 tokens. The analysis was conducted on a subset of 29,096 tokens, corresponding to a matrix with no more than a 99% sparsity ratio. RESULTS: The highest predictive values were achieved through boosting (positive predicative value [PPV] 63.1, 95% CI 42.7-83.5 and negative predicative value [NPV] 98.8, 95% CI 98.3-99.3). GLMNet delivered superior predictive capability compared to MAXENT (PPV 24.5% and NPV 98.3% vs PPV 11.0% and NPV 98.0%). MAXENT and GLMNet predictions weakly agree with each other (agreement coefficient 1 [AC1]=0.60, 95% CI 0.58-0.62), as well as with LogitBoost (MAXENT: AC1=0.64, 95% CI 0.63-0.66 and GLMNet: AC1=0.53, 95% CI 0.51-0.55). CONCLUSIONS: Boosting has demonstrated promising performance in large-scale EHR-based infectious disease identification.
format Online
Article
Text
id pubmed-7238079
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-72380792020-06-01 Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study Lanera, Corrado Berchialla, Paola Baldi, Ileana Lorenzoni, Giulia Tramontan, Lara Scamarcia, Antonio Cantarutti, Luigi Giaquinto, Carlo Gregori, Dario JMIR Med Inform Original Paper BACKGROUND: The detection of infectious diseases through the analysis of free text on electronic health reports (EHRs) can provide prompt and accurate background information for the implementation of preventative measures, such as advertising and monitoring the effectiveness of vaccination campaigns. OBJECTIVE: The purpose of this paper is to compare machine learning techniques in their application to EHR analysis for disease detection. METHODS: The Pedianet database was used as a data source for a real-world scenario on the identification of cases of varicella. The models’ training and test sets were based on two different Italian regions’ (Veneto and Sicilia) data sets of 7631 patients and 1,230,355 records, and 2347 patients and 569,926 records, respectively, for whom a gold standard of varicella diagnosis was available. Elastic-net regularized generalized linear model (GLMNet), maximum entropy (MAXENT), and LogitBoost (boosting) algorithms were implemented in a supervised environment and 5-fold cross-validated. The document-term matrix generated by the training set involves a dictionary of 1,871,532 tokens. The analysis was conducted on a subset of 29,096 tokens, corresponding to a matrix with no more than a 99% sparsity ratio. RESULTS: The highest predictive values were achieved through boosting (positive predicative value [PPV] 63.1, 95% CI 42.7-83.5 and negative predicative value [NPV] 98.8, 95% CI 98.3-99.3). GLMNet delivered superior predictive capability compared to MAXENT (PPV 24.5% and NPV 98.3% vs PPV 11.0% and NPV 98.0%). MAXENT and GLMNet predictions weakly agree with each other (agreement coefficient 1 [AC1]=0.60, 95% CI 0.58-0.62), as well as with LogitBoost (MAXENT: AC1=0.64, 95% CI 0.63-0.66 and GLMNet: AC1=0.53, 95% CI 0.51-0.55). CONCLUSIONS: Boosting has demonstrated promising performance in large-scale EHR-based infectious disease identification. JMIR Publications 2020-05-05 /pmc/articles/PMC7238079/ /pubmed/32369038 http://dx.doi.org/10.2196/14330 Text en ©Corrado Lanera, Paola Berchialla, Ileana Baldi, Giulia Lorenzoni, Lara Tramontan, Antonio Scamarcia, Luigi Cantarutti, Carlo Giaquinto, Dario Gregori. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 05.05.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Lanera, Corrado
Berchialla, Paola
Baldi, Ileana
Lorenzoni, Giulia
Tramontan, Lara
Scamarcia, Antonio
Cantarutti, Luigi
Giaquinto, Carlo
Gregori, Dario
Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study
title Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study
title_full Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study
title_fullStr Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study
title_full_unstemmed Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study
title_short Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study
title_sort use of machine learning techniques for case-detection of varicella zoster using routinely collected textual ambulatory records: pilot observational study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7238079/
https://www.ncbi.nlm.nih.gov/pubmed/32369038
http://dx.doi.org/10.2196/14330
work_keys_str_mv AT laneracorrado useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy
AT berchiallapaola useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy
AT baldiileana useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy
AT lorenzonigiulia useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy
AT tramontanlara useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy
AT scamarciaantonio useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy
AT cantaruttiluigi useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy
AT giaquintocarlo useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy
AT gregoridario useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy