Cargando…
Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study
BACKGROUND: The detection of infectious diseases through the analysis of free text on electronic health reports (EHRs) can provide prompt and accurate background information for the implementation of preventative measures, such as advertising and monitoring the effectiveness of vaccination campaigns...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7238079/ https://www.ncbi.nlm.nih.gov/pubmed/32369038 http://dx.doi.org/10.2196/14330 |
_version_ | 1783536460956172288 |
---|---|
author | Lanera, Corrado Berchialla, Paola Baldi, Ileana Lorenzoni, Giulia Tramontan, Lara Scamarcia, Antonio Cantarutti, Luigi Giaquinto, Carlo Gregori, Dario |
author_facet | Lanera, Corrado Berchialla, Paola Baldi, Ileana Lorenzoni, Giulia Tramontan, Lara Scamarcia, Antonio Cantarutti, Luigi Giaquinto, Carlo Gregori, Dario |
author_sort | Lanera, Corrado |
collection | PubMed |
description | BACKGROUND: The detection of infectious diseases through the analysis of free text on electronic health reports (EHRs) can provide prompt and accurate background information for the implementation of preventative measures, such as advertising and monitoring the effectiveness of vaccination campaigns. OBJECTIVE: The purpose of this paper is to compare machine learning techniques in their application to EHR analysis for disease detection. METHODS: The Pedianet database was used as a data source for a real-world scenario on the identification of cases of varicella. The models’ training and test sets were based on two different Italian regions’ (Veneto and Sicilia) data sets of 7631 patients and 1,230,355 records, and 2347 patients and 569,926 records, respectively, for whom a gold standard of varicella diagnosis was available. Elastic-net regularized generalized linear model (GLMNet), maximum entropy (MAXENT), and LogitBoost (boosting) algorithms were implemented in a supervised environment and 5-fold cross-validated. The document-term matrix generated by the training set involves a dictionary of 1,871,532 tokens. The analysis was conducted on a subset of 29,096 tokens, corresponding to a matrix with no more than a 99% sparsity ratio. RESULTS: The highest predictive values were achieved through boosting (positive predicative value [PPV] 63.1, 95% CI 42.7-83.5 and negative predicative value [NPV] 98.8, 95% CI 98.3-99.3). GLMNet delivered superior predictive capability compared to MAXENT (PPV 24.5% and NPV 98.3% vs PPV 11.0% and NPV 98.0%). MAXENT and GLMNet predictions weakly agree with each other (agreement coefficient 1 [AC1]=0.60, 95% CI 0.58-0.62), as well as with LogitBoost (MAXENT: AC1=0.64, 95% CI 0.63-0.66 and GLMNet: AC1=0.53, 95% CI 0.51-0.55). CONCLUSIONS: Boosting has demonstrated promising performance in large-scale EHR-based infectious disease identification. |
format | Online Article Text |
id | pubmed-7238079 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-72380792020-06-01 Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study Lanera, Corrado Berchialla, Paola Baldi, Ileana Lorenzoni, Giulia Tramontan, Lara Scamarcia, Antonio Cantarutti, Luigi Giaquinto, Carlo Gregori, Dario JMIR Med Inform Original Paper BACKGROUND: The detection of infectious diseases through the analysis of free text on electronic health reports (EHRs) can provide prompt and accurate background information for the implementation of preventative measures, such as advertising and monitoring the effectiveness of vaccination campaigns. OBJECTIVE: The purpose of this paper is to compare machine learning techniques in their application to EHR analysis for disease detection. METHODS: The Pedianet database was used as a data source for a real-world scenario on the identification of cases of varicella. The models’ training and test sets were based on two different Italian regions’ (Veneto and Sicilia) data sets of 7631 patients and 1,230,355 records, and 2347 patients and 569,926 records, respectively, for whom a gold standard of varicella diagnosis was available. Elastic-net regularized generalized linear model (GLMNet), maximum entropy (MAXENT), and LogitBoost (boosting) algorithms were implemented in a supervised environment and 5-fold cross-validated. The document-term matrix generated by the training set involves a dictionary of 1,871,532 tokens. The analysis was conducted on a subset of 29,096 tokens, corresponding to a matrix with no more than a 99% sparsity ratio. RESULTS: The highest predictive values were achieved through boosting (positive predicative value [PPV] 63.1, 95% CI 42.7-83.5 and negative predicative value [NPV] 98.8, 95% CI 98.3-99.3). GLMNet delivered superior predictive capability compared to MAXENT (PPV 24.5% and NPV 98.3% vs PPV 11.0% and NPV 98.0%). MAXENT and GLMNet predictions weakly agree with each other (agreement coefficient 1 [AC1]=0.60, 95% CI 0.58-0.62), as well as with LogitBoost (MAXENT: AC1=0.64, 95% CI 0.63-0.66 and GLMNet: AC1=0.53, 95% CI 0.51-0.55). CONCLUSIONS: Boosting has demonstrated promising performance in large-scale EHR-based infectious disease identification. JMIR Publications 2020-05-05 /pmc/articles/PMC7238079/ /pubmed/32369038 http://dx.doi.org/10.2196/14330 Text en ©Corrado Lanera, Paola Berchialla, Ileana Baldi, Giulia Lorenzoni, Lara Tramontan, Antonio Scamarcia, Luigi Cantarutti, Carlo Giaquinto, Dario Gregori. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 05.05.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Lanera, Corrado Berchialla, Paola Baldi, Ileana Lorenzoni, Giulia Tramontan, Lara Scamarcia, Antonio Cantarutti, Luigi Giaquinto, Carlo Gregori, Dario Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study |
title | Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study |
title_full | Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study |
title_fullStr | Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study |
title_full_unstemmed | Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study |
title_short | Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study |
title_sort | use of machine learning techniques for case-detection of varicella zoster using routinely collected textual ambulatory records: pilot observational study |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7238079/ https://www.ncbi.nlm.nih.gov/pubmed/32369038 http://dx.doi.org/10.2196/14330 |
work_keys_str_mv | AT laneracorrado useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy AT berchiallapaola useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy AT baldiileana useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy AT lorenzonigiulia useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy AT tramontanlara useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy AT scamarciaantonio useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy AT cantaruttiluigi useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy AT giaquintocarlo useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy AT gregoridario useofmachinelearningtechniquesforcasedetectionofvaricellazosterusingroutinelycollectedtextualambulatoryrecordspilotobservationalstudy |