Cargando…

An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data

Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages o...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Yuzhe, Gopalakrishnan, Vanathi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2017
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5325161/ https://www.ncbi.nlm.nih.gov/pubmed/28243594 http://dx.doi.org/10.3390/data2010008

_version_	1782510326016638976
author	Liu, Yuzhe Gopalakrishnan, Vanathi
author_facet	Liu, Yuzhe Gopalakrishnan, Vanathi
author_sort	Liu, Yuzhe
collection	PubMed
description	Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.
format	Online Article Text
id	pubmed-5325161
institution	National Center for Biotechnology Information
language	English
publishDate	2017
record_format	MEDLINE/PubMed
spelling	pubmed-53251612017-03-01 An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data Liu, Yuzhe Gopalakrishnan, Vanathi Data (Basel) Article Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models. 2017-01-25 2017-03 /pmc/articles/PMC5325161/ /pubmed/28243594 http://dx.doi.org/10.3390/data2010008 Text en http://creativecommons.org/licenses/by/4.0/ This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Liu, Yuzhe Gopalakrishnan, Vanathi An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data
title	An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data
title_full	An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data
title_fullStr	An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data
title_full_unstemmed	An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data
title_short	An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data
title_sort	overview and evaluation of recent machine learning imputation methods using cardiac imaging data
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5325161/ https://www.ncbi.nlm.nih.gov/pubmed/28243594 http://dx.doi.org/10.3390/data2010008
work_keys_str_mv	AT liuyuzhe anoverviewandevaluationofrecentmachinelearningimputationmethodsusingcardiacimagingdata AT gopalakrishnanvanathi anoverviewandevaluationofrecentmachinelearningimputationmethodsusingcardiacimagingdata AT liuyuzhe overviewandevaluationofrecentmachinelearningimputationmethodsusingcardiacimagingdata AT gopalakrishnanvanathi overviewandevaluationofrecentmachinelearningimputationmethodsusingcardiacimagingdata

An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data

Ejemplares similares