Cargando…

High Dimensional Variable Selection with Error Control

Background. The iterative sure independence screening (ISIS) is a popular method in selecting important variables while maintaining most of the informative variables relevant to the outcome in high throughput data. However, it not only is computationally intensive but also may cause high false disco...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Sangjin, Halabi, Susan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5002494/
https://www.ncbi.nlm.nih.gov/pubmed/27597974
http://dx.doi.org/10.1155/2016/8209453
_version_ 1782450578913230848
author Kim, Sangjin
Halabi, Susan
author_facet Kim, Sangjin
Halabi, Susan
author_sort Kim, Sangjin
collection PubMed
description Background. The iterative sure independence screening (ISIS) is a popular method in selecting important variables while maintaining most of the informative variables relevant to the outcome in high throughput data. However, it not only is computationally intensive but also may cause high false discovery rate (FDR). We propose to use the FDR as a screening method to reduce the high dimension to a lower dimension as well as controlling the FDR with three popular variable selection methods: LASSO, SCAD, and MCP. Method. The three methods with the proposed screenings were applied to prostate cancer data with presence of metastasis as the outcome. Results. Simulations showed that the three variable selection methods with the proposed screenings controlled the predefined FDR and produced high area under the receiver operating characteristic curve (AUROC) scores. In applying these methods to the prostate cancer example, LASSO and MCP selected 12 and 8 genes and produced AUROC scores of 0.746 and 0.764, respectively. Conclusions. We demonstrated that the variable selection methods with the sequential use of FDR and ISIS not only controlled the predefined FDR in the final models but also had relatively high AUROC scores.
format Online
Article
Text
id pubmed-5002494
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-50024942016-09-05 High Dimensional Variable Selection with Error Control Kim, Sangjin Halabi, Susan Biomed Res Int Research Article Background. The iterative sure independence screening (ISIS) is a popular method in selecting important variables while maintaining most of the informative variables relevant to the outcome in high throughput data. However, it not only is computationally intensive but also may cause high false discovery rate (FDR). We propose to use the FDR as a screening method to reduce the high dimension to a lower dimension as well as controlling the FDR with three popular variable selection methods: LASSO, SCAD, and MCP. Method. The three methods with the proposed screenings were applied to prostate cancer data with presence of metastasis as the outcome. Results. Simulations showed that the three variable selection methods with the proposed screenings controlled the predefined FDR and produced high area under the receiver operating characteristic curve (AUROC) scores. In applying these methods to the prostate cancer example, LASSO and MCP selected 12 and 8 genes and produced AUROC scores of 0.746 and 0.764, respectively. Conclusions. We demonstrated that the variable selection methods with the sequential use of FDR and ISIS not only controlled the predefined FDR in the final models but also had relatively high AUROC scores. Hindawi Publishing Corporation 2016 2016-08-15 /pmc/articles/PMC5002494/ /pubmed/27597974 http://dx.doi.org/10.1155/2016/8209453 Text en Copyright © 2016 S. Kim and S. Halabi. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kim, Sangjin
Halabi, Susan
High Dimensional Variable Selection with Error Control
title High Dimensional Variable Selection with Error Control
title_full High Dimensional Variable Selection with Error Control
title_fullStr High Dimensional Variable Selection with Error Control
title_full_unstemmed High Dimensional Variable Selection with Error Control
title_short High Dimensional Variable Selection with Error Control
title_sort high dimensional variable selection with error control
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5002494/
https://www.ncbi.nlm.nih.gov/pubmed/27597974
http://dx.doi.org/10.1155/2016/8209453
work_keys_str_mv AT kimsangjin highdimensionalvariableselectionwitherrorcontrol
AT halabisusan highdimensionalvariableselectionwitherrorcontrol