Cargando…

Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data m...

Descripción completa

Detalles Bibliográficos
Autores principales: Razzaghi, Talayeh, Roderick, Oleg, Safro, Ilya, Marko, Nicholas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4873242/
https://www.ncbi.nlm.nih.gov/pubmed/27195952
http://dx.doi.org/10.1371/journal.pone.0155119
_version_ 1782432870618365952
author Razzaghi, Talayeh
Roderick, Oleg
Safro, Ilya
Marko, Nicholas
author_facet Razzaghi, Talayeh
Roderick, Oleg
Safro, Ilya
Marko, Nicholas
author_sort Razzaghi, Talayeh
collection PubMed
description This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.
format Online
Article
Text
id pubmed-4873242
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-48732422016-06-09 Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values Razzaghi, Talayeh Roderick, Oleg Safro, Ilya Marko, Nicholas PLoS One Research Article This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results. Public Library of Science 2016-05-19 /pmc/articles/PMC4873242/ /pubmed/27195952 http://dx.doi.org/10.1371/journal.pone.0155119 Text en © 2016 Razzaghi et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Razzaghi, Talayeh
Roderick, Oleg
Safro, Ilya
Marko, Nicholas
Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
title Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
title_full Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
title_fullStr Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
title_full_unstemmed Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
title_short Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
title_sort multilevel weighted support vector machine for classification on healthcare data with missing values
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4873242/
https://www.ncbi.nlm.nih.gov/pubmed/27195952
http://dx.doi.org/10.1371/journal.pone.0155119
work_keys_str_mv AT razzaghitalayeh multilevelweightedsupportvectormachineforclassificationonhealthcaredatawithmissingvalues
AT roderickoleg multilevelweightedsupportvectormachineforclassificationonhealthcaredatawithmissingvalues
AT safroilya multilevelweightedsupportvectormachineforclassificationonhealthcaredatawithmissingvalues
AT markonicholas multilevelweightedsupportvectormachineforclassificationonhealthcaredatawithmissingvalues