Cargando…

Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme

BACKGROUND: Gene expression profiling has become a useful biological resource in recent years, and it plays an important role in a broad range of areas in biology. The raw gene expression data, usually in the form of large matrix, may contain missing values. The downstream analysis methods that post...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Xian, Li, Ao, Jiang, Zhaohui, Feng, Huanqing
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1403803/
https://www.ncbi.nlm.nih.gov/pubmed/16426462
http://dx.doi.org/10.1186/1471-2105-7-32
_version_ 1782127039819546624
author Wang, Xian
Li, Ao
Jiang, Zhaohui
Feng, Huanqing
author_facet Wang, Xian
Li, Ao
Jiang, Zhaohui
Feng, Huanqing
author_sort Wang, Xian
collection PubMed
description BACKGROUND: Gene expression profiling has become a useful biological resource in recent years, and it plays an important role in a broad range of areas in biology. The raw gene expression data, usually in the form of large matrix, may contain missing values. The downstream analysis methods that postulate complete matrix input are thus not applicable. Several methods have been developed to solve this problem, such as K nearest neighbor impute method, Bayesian principal components analysis impute method, etc. In this paper, we introduce a novel imputing approach based on the Support Vector Regression (SVR) method. The proposed approach utilizes an orthogonal coding input scheme, which makes use of multi-missing values in one row of a certain gene expression profile and imputes the missing value into a much higher dimensional space, to obtain better performance. RESULTS: A comparative study of our method with the previously developed methods has been presented for the estimation of the missing values on six gene expression data sets. Among the three different input-vector coding schemes we tried, the orthogonal input coding scheme obtains the best estimation results with the minimum Normalized Root Mean Squared Error (NRMSE). The results also demonstrate that the SVR method has powerful estimation ability on different kinds of data sets with relatively small NRMSE. CONCLUSION: The SVR impute method shows better performance than, or at least comparable with, the previously developed methods in present research. The outstanding estimation ability of this impute method is partly due to the use of the most missing value information by incorporating orthogonal input coding scheme. In addition, the solid theoretical foundation of SVR method also helps in estimation of performance together with orthogonal input coding scheme. The promising estimation ability demonstrated in the results section suggests that the proposed approach provides a proper solution to the missing value estimation problem. The source code of the SVR method is available from for non-commercial use.
format Text
id pubmed-1403803
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-14038032006-04-21 Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme Wang, Xian Li, Ao Jiang, Zhaohui Feng, Huanqing BMC Bioinformatics Methodology Article BACKGROUND: Gene expression profiling has become a useful biological resource in recent years, and it plays an important role in a broad range of areas in biology. The raw gene expression data, usually in the form of large matrix, may contain missing values. The downstream analysis methods that postulate complete matrix input are thus not applicable. Several methods have been developed to solve this problem, such as K nearest neighbor impute method, Bayesian principal components analysis impute method, etc. In this paper, we introduce a novel imputing approach based on the Support Vector Regression (SVR) method. The proposed approach utilizes an orthogonal coding input scheme, which makes use of multi-missing values in one row of a certain gene expression profile and imputes the missing value into a much higher dimensional space, to obtain better performance. RESULTS: A comparative study of our method with the previously developed methods has been presented for the estimation of the missing values on six gene expression data sets. Among the three different input-vector coding schemes we tried, the orthogonal input coding scheme obtains the best estimation results with the minimum Normalized Root Mean Squared Error (NRMSE). The results also demonstrate that the SVR method has powerful estimation ability on different kinds of data sets with relatively small NRMSE. CONCLUSION: The SVR impute method shows better performance than, or at least comparable with, the previously developed methods in present research. The outstanding estimation ability of this impute method is partly due to the use of the most missing value information by incorporating orthogonal input coding scheme. In addition, the solid theoretical foundation of SVR method also helps in estimation of performance together with orthogonal input coding scheme. The promising estimation ability demonstrated in the results section suggests that the proposed approach provides a proper solution to the missing value estimation problem. The source code of the SVR method is available from for non-commercial use. BioMed Central 2006-01-22 /pmc/articles/PMC1403803/ /pubmed/16426462 http://dx.doi.org/10.1186/1471-2105-7-32 Text en Copyright © 2006 Wang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Wang, Xian
Li, Ao
Jiang, Zhaohui
Feng, Huanqing
Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme
title Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme
title_full Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme
title_fullStr Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme
title_full_unstemmed Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme
title_short Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme
title_sort missing value estimation for dna microarray gene expression data by support vector regression imputation and orthogonal coding scheme
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1403803/
https://www.ncbi.nlm.nih.gov/pubmed/16426462
http://dx.doi.org/10.1186/1471-2105-7-32
work_keys_str_mv AT wangxian missingvalueestimationfordnamicroarraygeneexpressiondatabysupportvectorregressionimputationandorthogonalcodingscheme
AT liao missingvalueestimationfordnamicroarraygeneexpressiondatabysupportvectorregressionimputationandorthogonalcodingscheme
AT jiangzhaohui missingvalueestimationfordnamicroarraygeneexpressiondatabysupportvectorregressionimputationandorthogonalcodingscheme
AT fenghuanqing missingvalueestimationfordnamicroarraygeneexpressiondatabysupportvectorregressionimputationandorthogonalcodingscheme