Cargando…

Neyman-Pearson classification algorithms and NP receiver operating characteristics

In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this...

Descripción completa

Detalles Bibliográficos
Autores principales: Tong, Xin, Feng, Yang, Li, Jingyi Jessica
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Association for the Advancement of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5804623/
https://www.ncbi.nlm.nih.gov/pubmed/29423442
http://dx.doi.org/10.1126/sciadv.aao1659
_version_ 1783298878247796736
author Tong, Xin
Feng, Yang
Li, Jingyi Jessica
author_facet Tong, Xin
Feng, Yang
Li, Jingyi Jessica
author_sort Tong, Xin
collection PubMed
description In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, α, on the type I error. Despite its century-long history in hypothesis testing, the NP paradigm has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than α do not satisfy the type I error control objective because the resulting classifiers are likely to have type I errors much larger than α, and the NP paradigm has not been properly implemented in practice. We develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, such as logistic regression, support vector machines, and random forests. Powered by this algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands motivated by the popular ROC curves. NP-ROC bands will help choose α in a data-adaptive way and compare different NP classifiers. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the R package nproc, through simulation and real data studies.
format Online
Article
Text
id pubmed-5804623
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher American Association for the Advancement of Science
record_format MEDLINE/PubMed
spelling pubmed-58046232018-02-08 Neyman-Pearson classification algorithms and NP receiver operating characteristics Tong, Xin Feng, Yang Li, Jingyi Jessica Sci Adv Research Articles In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, α, on the type I error. Despite its century-long history in hypothesis testing, the NP paradigm has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than α do not satisfy the type I error control objective because the resulting classifiers are likely to have type I errors much larger than α, and the NP paradigm has not been properly implemented in practice. We develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, such as logistic regression, support vector machines, and random forests. Powered by this algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands motivated by the popular ROC curves. NP-ROC bands will help choose α in a data-adaptive way and compare different NP classifiers. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the R package nproc, through simulation and real data studies. American Association for the Advancement of Science 2018-02-02 /pmc/articles/PMC5804623/ /pubmed/29423442 http://dx.doi.org/10.1126/sciadv.aao1659 Text en Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC). http://creativecommons.org/licenses/by-nc/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license (http://creativecommons.org/licenses/by-nc/4.0/) , which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.
spellingShingle Research Articles
Tong, Xin
Feng, Yang
Li, Jingyi Jessica
Neyman-Pearson classification algorithms and NP receiver operating characteristics
title Neyman-Pearson classification algorithms and NP receiver operating characteristics
title_full Neyman-Pearson classification algorithms and NP receiver operating characteristics
title_fullStr Neyman-Pearson classification algorithms and NP receiver operating characteristics
title_full_unstemmed Neyman-Pearson classification algorithms and NP receiver operating characteristics
title_short Neyman-Pearson classification algorithms and NP receiver operating characteristics
title_sort neyman-pearson classification algorithms and np receiver operating characteristics
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5804623/
https://www.ncbi.nlm.nih.gov/pubmed/29423442
http://dx.doi.org/10.1126/sciadv.aao1659
work_keys_str_mv AT tongxin neymanpearsonclassificationalgorithmsandnpreceiveroperatingcharacteristics
AT fengyang neymanpearsonclassificationalgorithmsandnpreceiveroperatingcharacteristics
AT lijingyijessica neymanpearsonclassificationalgorithmsandnpreceiveroperatingcharacteristics