Cargando…
Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records
BACKGROUND: Distinguishing cases from non-cases in free-text electronic medical records is an important initial step in observational epidemiological studies, but manual record validation is time-consuming and cumbersome. We compared different approaches to develop an automatic case identification s...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3602667/ https://www.ncbi.nlm.nih.gov/pubmed/23452306 http://dx.doi.org/10.1186/1472-6947-13-30 |
_version_ | 1782263591656751104 |
---|---|
author | Afzal, Zubair Schuemie, Martijn J van Blijderveen, Jan C Sen, Elif F Sturkenboom, Miriam CJM Kors, Jan A |
author_facet | Afzal, Zubair Schuemie, Martijn J van Blijderveen, Jan C Sen, Elif F Sturkenboom, Miriam CJM Kors, Jan A |
author_sort | Afzal, Zubair |
collection | PubMed |
description | BACKGROUND: Distinguishing cases from non-cases in free-text electronic medical records is an important initial step in observational epidemiological studies, but manual record validation is time-consuming and cumbersome. We compared different approaches to develop an automatic case identification system with high sensitivity to assist manual annotators. METHODS: We used four different machine-learning algorithms to build case identification systems for two data sets, one comprising hepatobiliary disease patients, the other acute renal failure patients. To improve the sensitivity of the systems, we varied the imbalance ratio between positive cases and negative cases using under- and over-sampling techniques, and applied cost-sensitive learning with various misclassification costs. RESULTS: For the hepatobiliary data set, we obtained a high sensitivity of 0.95 (on a par with manual annotators, as compared to 0.91 for a baseline classifier) with specificity 0.56. For the acute renal failure data set, sensitivity increased from 0.69 to 0.89, with specificity 0.59. Performance differences between the various machine-learning algorithms were not large. Classifiers performed best when trained on data sets with imbalance ratio below 10. CONCLUSIONS: We were able to achieve high sensitivity with moderate specificity for automatic case identification on two data sets of electronic medical records. Such a high-sensitive case identification system can be used as a pre-filter to significantly reduce the burden of manual record validation. |
format | Online Article Text |
id | pubmed-3602667 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-36026672013-03-21 Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records Afzal, Zubair Schuemie, Martijn J van Blijderveen, Jan C Sen, Elif F Sturkenboom, Miriam CJM Kors, Jan A BMC Med Inform Decis Mak Research Article BACKGROUND: Distinguishing cases from non-cases in free-text electronic medical records is an important initial step in observational epidemiological studies, but manual record validation is time-consuming and cumbersome. We compared different approaches to develop an automatic case identification system with high sensitivity to assist manual annotators. METHODS: We used four different machine-learning algorithms to build case identification systems for two data sets, one comprising hepatobiliary disease patients, the other acute renal failure patients. To improve the sensitivity of the systems, we varied the imbalance ratio between positive cases and negative cases using under- and over-sampling techniques, and applied cost-sensitive learning with various misclassification costs. RESULTS: For the hepatobiliary data set, we obtained a high sensitivity of 0.95 (on a par with manual annotators, as compared to 0.91 for a baseline classifier) with specificity 0.56. For the acute renal failure data set, sensitivity increased from 0.69 to 0.89, with specificity 0.59. Performance differences between the various machine-learning algorithms were not large. Classifiers performed best when trained on data sets with imbalance ratio below 10. CONCLUSIONS: We were able to achieve high sensitivity with moderate specificity for automatic case identification on two data sets of electronic medical records. Such a high-sensitive case identification system can be used as a pre-filter to significantly reduce the burden of manual record validation. BioMed Central 2013-03-02 /pmc/articles/PMC3602667/ /pubmed/23452306 http://dx.doi.org/10.1186/1472-6947-13-30 Text en Copyright ©2013 Afzal et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Afzal, Zubair Schuemie, Martijn J van Blijderveen, Jan C Sen, Elif F Sturkenboom, Miriam CJM Kors, Jan A Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records |
title | Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records |
title_full | Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records |
title_fullStr | Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records |
title_full_unstemmed | Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records |
title_short | Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records |
title_sort | improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3602667/ https://www.ncbi.nlm.nih.gov/pubmed/23452306 http://dx.doi.org/10.1186/1472-6947-13-30 |
work_keys_str_mv | AT afzalzubair improvingsensitivityofmachinelearningmethodsforautomatedcaseidentificationfromfreetextelectronicmedicalrecords AT schuemiemartijnj improvingsensitivityofmachinelearningmethodsforautomatedcaseidentificationfromfreetextelectronicmedicalrecords AT vanblijderveenjanc improvingsensitivityofmachinelearningmethodsforautomatedcaseidentificationfromfreetextelectronicmedicalrecords AT seneliff improvingsensitivityofmachinelearningmethodsforautomatedcaseidentificationfromfreetextelectronicmedicalrecords AT sturkenboommiriamcjm improvingsensitivityofmachinelearningmethodsforautomatedcaseidentificationfromfreetextelectronicmedicalrecords AT korsjana improvingsensitivityofmachinelearningmethodsforautomatedcaseidentificationfromfreetextelectronicmedicalrecords |