Cargando…

Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study

Many missing-value (MV) imputation methods have been developed for microarray data, but only a few studies have investigated the relationship between MV imputation and classification accuracy. Furthermore, these studies are problematic in fundamental steps such as MV generation and classifier error...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sun, Youting, Braga-Neto, Ulisses, Dougherty, EdwardR
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer 2010
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3171429/ https://www.ncbi.nlm.nih.gov/pubmed/20224634 http://dx.doi.org/10.1155/2009/504069

_version_	1782211760738009088
author	Sun, Youting Braga-Neto, Ulisses Dougherty, EdwardR
author_facet	Sun, Youting Braga-Neto, Ulisses Dougherty, EdwardR
author_sort	Sun, Youting
collection	PubMed
description	Many missing-value (MV) imputation methods have been developed for microarray data, but only a few studies have investigated the relationship between MV imputation and classification accuracy. Furthermore, these studies are problematic in fundamental steps such as MV generation and classifier error estimation. In this work, we carry out a model-based study that addresses some of the issues in previous studies. Six popular imputation algorithms, two feature selection methods, and three classification rules are considered. The results suggest that it is beneficial to apply MV imputation when the noise level is high, variance is small, or gene-cluster correlation is strong, under small to moderate MV rates. In these cases, if data quality metrics are available, then it may be helpful to consider the data point with poor quality as missing and apply one of the most robust imputation algorithms to estimate the true signal based on the available high-quality data points. However, at large MV rates, we conclude that imputation methods are not recommended. Regarding the MV rate, our results indicate the presence of a peaking phenomenon: performance of imputation methods actually improves initially as the MV rate increases, but after an optimum point, performance quickly deteriorates with increasing MV rates.
format	Online Article Text
id	pubmed-3171429
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Springer
record_format	MEDLINE/PubMed
spelling	pubmed-31714292011-09-13 Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study Sun, Youting Braga-Neto, Ulisses Dougherty, EdwardR EURASIP J Bioinform Syst Biol Research Article Many missing-value (MV) imputation methods have been developed for microarray data, but only a few studies have investigated the relationship between MV imputation and classification accuracy. Furthermore, these studies are problematic in fundamental steps such as MV generation and classifier error estimation. In this work, we carry out a model-based study that addresses some of the issues in previous studies. Six popular imputation algorithms, two feature selection methods, and three classification rules are considered. The results suggest that it is beneficial to apply MV imputation when the noise level is high, variance is small, or gene-cluster correlation is strong, under small to moderate MV rates. In these cases, if data quality metrics are available, then it may be helpful to consider the data point with poor quality as missing and apply one of the most robust imputation algorithms to estimate the true signal based on the available high-quality data points. However, at large MV rates, we conclude that imputation methods are not recommended. Regarding the MV rate, our results indicate the presence of a peaking phenomenon: performance of imputation methods actually improves initially as the MV rate increases, but after an optimum point, performance quickly deteriorates with increasing MV rates. Springer 2010-01-04 /pmc/articles/PMC3171429/ /pubmed/20224634 http://dx.doi.org/10.1155/2009/504069 Text en Copyright © 2009 Youting Sun et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Sun, Youting Braga-Neto, Ulisses Dougherty, EdwardR Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study
title	Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study
title_full	Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study
title_fullStr	Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study
title_full_unstemmed	Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study
title_short	Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study
title_sort	impact of missing value imputation on classification for dna microarray gene expression data—a model-based study
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3171429/ https://www.ncbi.nlm.nih.gov/pubmed/20224634 http://dx.doi.org/10.1155/2009/504069
work_keys_str_mv	AT sunyouting impactofmissingvalueimputationonclassificationfordnamicroarraygeneexpressiondataamodelbasedstudy AT braganetoulisses impactofmissingvalueimputationonclassificationfordnamicroarraygeneexpressiondataamodelbasedstudy AT doughertyedwardr impactofmissingvalueimputationonclassificationfordnamicroarraygeneexpressiondataamodelbasedstudy

Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study

Ejemplares similares