Cargando…

Dimension reduction with gene expression data using targeted variable importance measurement

BACKGROUND: When a large number of candidate variables are present, a dimension reduction procedure is usually conducted to reduce the variable space before the subsequent analysis is carried out. The goal of dimension reduction is to find a list of candidate genes with a more operable length ideall...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Hui, van der Laan, Mark J
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3166941/ https://www.ncbi.nlm.nih.gov/pubmed/21849016 http://dx.doi.org/10.1186/1471-2105-12-312

_version_	1782211210841686016
author	Wang, Hui van der Laan, Mark J
author_facet	Wang, Hui van der Laan, Mark J
author_sort	Wang, Hui
collection	PubMed
description	BACKGROUND: When a large number of candidate variables are present, a dimension reduction procedure is usually conducted to reduce the variable space before the subsequent analysis is carried out. The goal of dimension reduction is to find a list of candidate genes with a more operable length ideally including all the relevant genes. Leaving many uninformative genes in the analysis can lead to biased estimates and reduced power. Therefore, dimension reduction is often considered a necessary predecessor of the analysis because it can not only reduce the cost of handling numerous variables, but also has the potential to improve the performance of the downstream analysis algorithms. RESULTS: We propose a TMLE-VIM dimension reduction procedure based on the variable importance measurement (VIM) in the frame work of targeted maximum likelihood estimation (TMLE). TMLE is an extension of maximum likelihood estimation targeting the parameter of interest. TMLE-VIM is a two-stage procedure. The first stage resorts to a machine learning algorithm, and the second step improves the first stage estimation with respect to the parameter of interest. CONCLUSIONS: We demonstrate with simulations and data analyses that our approach not only enjoys the prediction power of machine learning algorithms, but also accounts for the correlation structures among variables and therefore produces better variable rankings. When utilized in dimension reduction, TMLE-VIM can help to obtain the shortest possible list with the most truly associated variables.
format	Online Article Text
id	pubmed-3166941
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-31669412011-09-06 Dimension reduction with gene expression data using targeted variable importance measurement Wang, Hui van der Laan, Mark J BMC Bioinformatics Methodology Article BACKGROUND: When a large number of candidate variables are present, a dimension reduction procedure is usually conducted to reduce the variable space before the subsequent analysis is carried out. The goal of dimension reduction is to find a list of candidate genes with a more operable length ideally including all the relevant genes. Leaving many uninformative genes in the analysis can lead to biased estimates and reduced power. Therefore, dimension reduction is often considered a necessary predecessor of the analysis because it can not only reduce the cost of handling numerous variables, but also has the potential to improve the performance of the downstream analysis algorithms. RESULTS: We propose a TMLE-VIM dimension reduction procedure based on the variable importance measurement (VIM) in the frame work of targeted maximum likelihood estimation (TMLE). TMLE is an extension of maximum likelihood estimation targeting the parameter of interest. TMLE-VIM is a two-stage procedure. The first stage resorts to a machine learning algorithm, and the second step improves the first stage estimation with respect to the parameter of interest. CONCLUSIONS: We demonstrate with simulations and data analyses that our approach not only enjoys the prediction power of machine learning algorithms, but also accounts for the correlation structures among variables and therefore produces better variable rankings. When utilized in dimension reduction, TMLE-VIM can help to obtain the shortest possible list with the most truly associated variables. BioMed Central 2011-07-29 /pmc/articles/PMC3166941/ /pubmed/21849016 http://dx.doi.org/10.1186/1471-2105-12-312 Text en Copyright ©2011 Wang and van der Laan; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Wang, Hui van der Laan, Mark J Dimension reduction with gene expression data using targeted variable importance measurement
title	Dimension reduction with gene expression data using targeted variable importance measurement
title_full	Dimension reduction with gene expression data using targeted variable importance measurement
title_fullStr	Dimension reduction with gene expression data using targeted variable importance measurement
title_full_unstemmed	Dimension reduction with gene expression data using targeted variable importance measurement
title_short	Dimension reduction with gene expression data using targeted variable importance measurement
title_sort	dimension reduction with gene expression data using targeted variable importance measurement
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3166941/ https://www.ncbi.nlm.nih.gov/pubmed/21849016 http://dx.doi.org/10.1186/1471-2105-12-312
work_keys_str_mv	AT wanghui dimensionreductionwithgeneexpressiondatausingtargetedvariableimportancemeasurement AT vanderlaanmarkj dimensionreductionwithgeneexpressiondatausingtargetedvariableimportancemeasurement

Dimension reduction with gene expression data using targeted variable importance measurement

Ejemplares similares