Cargando…

Dimension reduction with gene expression data using targeted variable importance measurement

BACKGROUND: When a large number of candidate variables are present, a dimension reduction procedure is usually conducted to reduce the variable space before the subsequent analysis is carried out. The goal of dimension reduction is to find a list of candidate genes with a more operable length ideall...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Hui, van der Laan, Mark J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3166941/
https://www.ncbi.nlm.nih.gov/pubmed/21849016
http://dx.doi.org/10.1186/1471-2105-12-312
_version_ 1782211210841686016
author Wang, Hui
van der Laan, Mark J
author_facet Wang, Hui
van der Laan, Mark J
author_sort Wang, Hui
collection PubMed
description BACKGROUND: When a large number of candidate variables are present, a dimension reduction procedure is usually conducted to reduce the variable space before the subsequent analysis is carried out. The goal of dimension reduction is to find a list of candidate genes with a more operable length ideally including all the relevant genes. Leaving many uninformative genes in the analysis can lead to biased estimates and reduced power. Therefore, dimension reduction is often considered a necessary predecessor of the analysis because it can not only reduce the cost of handling numerous variables, but also has the potential to improve the performance of the downstream analysis algorithms. RESULTS: We propose a TMLE-VIM dimension reduction procedure based on the variable importance measurement (VIM) in the frame work of targeted maximum likelihood estimation (TMLE). TMLE is an extension of maximum likelihood estimation targeting the parameter of interest. TMLE-VIM is a two-stage procedure. The first stage resorts to a machine learning algorithm, and the second step improves the first stage estimation with respect to the parameter of interest. CONCLUSIONS: We demonstrate with simulations and data analyses that our approach not only enjoys the prediction power of machine learning algorithms, but also accounts for the correlation structures among variables and therefore produces better variable rankings. When utilized in dimension reduction, TMLE-VIM can help to obtain the shortest possible list with the most truly associated variables.
format Online
Article
Text
id pubmed-3166941
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31669412011-09-06 Dimension reduction with gene expression data using targeted variable importance measurement Wang, Hui van der Laan, Mark J BMC Bioinformatics Methodology Article BACKGROUND: When a large number of candidate variables are present, a dimension reduction procedure is usually conducted to reduce the variable space before the subsequent analysis is carried out. The goal of dimension reduction is to find a list of candidate genes with a more operable length ideally including all the relevant genes. Leaving many uninformative genes in the analysis can lead to biased estimates and reduced power. Therefore, dimension reduction is often considered a necessary predecessor of the analysis because it can not only reduce the cost of handling numerous variables, but also has the potential to improve the performance of the downstream analysis algorithms. RESULTS: We propose a TMLE-VIM dimension reduction procedure based on the variable importance measurement (VIM) in the frame work of targeted maximum likelihood estimation (TMLE). TMLE is an extension of maximum likelihood estimation targeting the parameter of interest. TMLE-VIM is a two-stage procedure. The first stage resorts to a machine learning algorithm, and the second step improves the first stage estimation with respect to the parameter of interest. CONCLUSIONS: We demonstrate with simulations and data analyses that our approach not only enjoys the prediction power of machine learning algorithms, but also accounts for the correlation structures among variables and therefore produces better variable rankings. When utilized in dimension reduction, TMLE-VIM can help to obtain the shortest possible list with the most truly associated variables. BioMed Central 2011-07-29 /pmc/articles/PMC3166941/ /pubmed/21849016 http://dx.doi.org/10.1186/1471-2105-12-312 Text en Copyright ©2011 Wang and van der Laan; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Wang, Hui
van der Laan, Mark J
Dimension reduction with gene expression data using targeted variable importance measurement
title Dimension reduction with gene expression data using targeted variable importance measurement
title_full Dimension reduction with gene expression data using targeted variable importance measurement
title_fullStr Dimension reduction with gene expression data using targeted variable importance measurement
title_full_unstemmed Dimension reduction with gene expression data using targeted variable importance measurement
title_short Dimension reduction with gene expression data using targeted variable importance measurement
title_sort dimension reduction with gene expression data using targeted variable importance measurement
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3166941/
https://www.ncbi.nlm.nih.gov/pubmed/21849016
http://dx.doi.org/10.1186/1471-2105-12-312
work_keys_str_mv AT wanghui dimensionreductionwithgeneexpressiondatausingtargetedvariableimportancemeasurement
AT vanderlaanmarkj dimensionreductionwithgeneexpressiondatausingtargetedvariableimportancemeasurement