Cargando…
Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data m...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4873242/ https://www.ncbi.nlm.nih.gov/pubmed/27195952 http://dx.doi.org/10.1371/journal.pone.0155119 |
_version_ | 1782432870618365952 |
---|---|
author | Razzaghi, Talayeh Roderick, Oleg Safro, Ilya Marko, Nicholas |
author_facet | Razzaghi, Talayeh Roderick, Oleg Safro, Ilya Marko, Nicholas |
author_sort | Razzaghi, Talayeh |
collection | PubMed |
description | This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results. |
format | Online Article Text |
id | pubmed-4873242 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-48732422016-06-09 Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values Razzaghi, Talayeh Roderick, Oleg Safro, Ilya Marko, Nicholas PLoS One Research Article This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results. Public Library of Science 2016-05-19 /pmc/articles/PMC4873242/ /pubmed/27195952 http://dx.doi.org/10.1371/journal.pone.0155119 Text en © 2016 Razzaghi et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Razzaghi, Talayeh Roderick, Oleg Safro, Ilya Marko, Nicholas Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values |
title | Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values |
title_full | Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values |
title_fullStr | Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values |
title_full_unstemmed | Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values |
title_short | Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values |
title_sort | multilevel weighted support vector machine for classification on healthcare data with missing values |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4873242/ https://www.ncbi.nlm.nih.gov/pubmed/27195952 http://dx.doi.org/10.1371/journal.pone.0155119 |
work_keys_str_mv | AT razzaghitalayeh multilevelweightedsupportvectormachineforclassificationonhealthcaredatawithmissingvalues AT roderickoleg multilevelweightedsupportvectormachineforclassificationonhealthcaredatawithmissingvalues AT safroilya multilevelweightedsupportvectormachineforclassificationonhealthcaredatawithmissingvalues AT markonicholas multilevelweightedsupportvectormachineforclassificationonhealthcaredatawithmissingvalues |