Cargando…

Model-free feature screening for categorical outcomes: Nonlinear effect detection and false discovery rate control

Feature screening has become a real prerequisite for the analysis of high-dimensional genomic data, as it is effective in reducing dimensionality and removing redundant features. However, existing methods for feature screening have been mostly relying on the assumptions of linear effects and indepen...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Qingyang, Du, Yuchun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6544247/
https://www.ncbi.nlm.nih.gov/pubmed/31150453
http://dx.doi.org/10.1371/journal.pone.0217463
_version_ 1783423222468837376
author Zhang, Qingyang
Du, Yuchun
author_facet Zhang, Qingyang
Du, Yuchun
author_sort Zhang, Qingyang
collection PubMed
description Feature screening has become a real prerequisite for the analysis of high-dimensional genomic data, as it is effective in reducing dimensionality and removing redundant features. However, existing methods for feature screening have been mostly relying on the assumptions of linear effects and independence (or weak dependence) between features, which might be inappropriate in real practice. In this paper, we consider the problem of selecting continuous features for a categorical outcome from high-dimensional data. We propose a powerful statistical procedure that consists of two steps, a nonparametric significance test based on edge count and a multiple testing procedure with dependence adjustment for false discovery rate control. The new method presents two novelties. First, the edge-count test directly targets distributional difference between groups, therefore it is sensitive to nonlinear effects. Second, we relax the independence assumption and adapt Efron’s procedure to adjust for the dependence between features. The performance of the proposed procedure, in terms of statistical power and false discovery rate, is illustrated by simulated data. We apply the new method to three genomic datasets to identify genes associated with colon, cervical and prostate cancers.
format Online
Article
Text
id pubmed-6544247
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-65442472019-06-17 Model-free feature screening for categorical outcomes: Nonlinear effect detection and false discovery rate control Zhang, Qingyang Du, Yuchun PLoS One Research Article Feature screening has become a real prerequisite for the analysis of high-dimensional genomic data, as it is effective in reducing dimensionality and removing redundant features. However, existing methods for feature screening have been mostly relying on the assumptions of linear effects and independence (or weak dependence) between features, which might be inappropriate in real practice. In this paper, we consider the problem of selecting continuous features for a categorical outcome from high-dimensional data. We propose a powerful statistical procedure that consists of two steps, a nonparametric significance test based on edge count and a multiple testing procedure with dependence adjustment for false discovery rate control. The new method presents two novelties. First, the edge-count test directly targets distributional difference between groups, therefore it is sensitive to nonlinear effects. Second, we relax the independence assumption and adapt Efron’s procedure to adjust for the dependence between features. The performance of the proposed procedure, in terms of statistical power and false discovery rate, is illustrated by simulated data. We apply the new method to three genomic datasets to identify genes associated with colon, cervical and prostate cancers. Public Library of Science 2019-05-31 /pmc/articles/PMC6544247/ /pubmed/31150453 http://dx.doi.org/10.1371/journal.pone.0217463 Text en © 2019 Zhang, Du http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhang, Qingyang
Du, Yuchun
Model-free feature screening for categorical outcomes: Nonlinear effect detection and false discovery rate control
title Model-free feature screening for categorical outcomes: Nonlinear effect detection and false discovery rate control
title_full Model-free feature screening for categorical outcomes: Nonlinear effect detection and false discovery rate control
title_fullStr Model-free feature screening for categorical outcomes: Nonlinear effect detection and false discovery rate control
title_full_unstemmed Model-free feature screening for categorical outcomes: Nonlinear effect detection and false discovery rate control
title_short Model-free feature screening for categorical outcomes: Nonlinear effect detection and false discovery rate control
title_sort model-free feature screening for categorical outcomes: nonlinear effect detection and false discovery rate control
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6544247/
https://www.ncbi.nlm.nih.gov/pubmed/31150453
http://dx.doi.org/10.1371/journal.pone.0217463
work_keys_str_mv AT zhangqingyang modelfreefeaturescreeningforcategoricaloutcomesnonlineareffectdetectionandfalsediscoveryratecontrol
AT duyuchun modelfreefeaturescreeningforcategoricaloutcomesnonlineareffectdetectionandfalsediscoveryratecontrol