Cargando…
Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
In the paper we revisit the problem of fitting logistic regression to positive and unlabelled data. There are two key contributions. First, a new light is shed on the properties of frequently used naive method (in which unlabelled examples are treated as negative). In particular we show that naive m...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303724/ http://dx.doi.org/10.1007/978-3-030-50423-6_1 |
_version_ | 1783548121313181696 |
---|---|
author | Teisseyre, Paweł Mielniczuk, Jan Łazęcka, Małgorzata |
author_facet | Teisseyre, Paweł Mielniczuk, Jan Łazęcka, Małgorzata |
author_sort | Teisseyre, Paweł |
collection | PubMed |
description | In the paper we revisit the problem of fitting logistic regression to positive and unlabelled data. There are two key contributions. First, a new light is shed on the properties of frequently used naive method (in which unlabelled examples are treated as negative). In particular we show that naive method is related to incorrect specification of the logistic model and consequently the parameters in naive method are shrunk towards zero. An interesting relationship between shrinkage parameter and label frequency is established. Second, we introduce a novel method of fitting logistic model based on simultaneous estimation of vector of coefficients and label frequency. Importantly, the proposed method does not require prior estimation, which is a major obstacle in positive unlabelled learning. The method is superior in predicting posterior probability to both naive method and weighted likelihood method for several benchmark data sets. Moreover, it yields consistently better estimator of label frequency than other two known methods. We also introduce simple but powerful representation of positive and unlabelled data under Selected Completely at Random assumption which yields straightforwardly most properties of such model. |
format | Online Article Text |
id | pubmed-7303724 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-73037242020-06-19 Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data Teisseyre, Paweł Mielniczuk, Jan Łazęcka, Małgorzata Computational Science – ICCS 2020 Article In the paper we revisit the problem of fitting logistic regression to positive and unlabelled data. There are two key contributions. First, a new light is shed on the properties of frequently used naive method (in which unlabelled examples are treated as negative). In particular we show that naive method is related to incorrect specification of the logistic model and consequently the parameters in naive method are shrunk towards zero. An interesting relationship between shrinkage parameter and label frequency is established. Second, we introduce a novel method of fitting logistic model based on simultaneous estimation of vector of coefficients and label frequency. Importantly, the proposed method does not require prior estimation, which is a major obstacle in positive unlabelled learning. The method is superior in predicting posterior probability to both naive method and weighted likelihood method for several benchmark data sets. Moreover, it yields consistently better estimator of label frequency than other two known methods. We also introduce simple but powerful representation of positive and unlabelled data under Selected Completely at Random assumption which yields straightforwardly most properties of such model. 2020-05-23 /pmc/articles/PMC7303724/ http://dx.doi.org/10.1007/978-3-030-50423-6_1 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Teisseyre, Paweł Mielniczuk, Jan Łazęcka, Małgorzata Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data |
title | Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data |
title_full | Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data |
title_fullStr | Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data |
title_full_unstemmed | Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data |
title_short | Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data |
title_sort | different strategies of fitting logistic regression for positive and unlabelled data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303724/ http://dx.doi.org/10.1007/978-3-030-50423-6_1 |
work_keys_str_mv | AT teisseyrepaweł differentstrategiesoffittinglogisticregressionforpositiveandunlabelleddata AT mielniczukjan differentstrategiesoffittinglogisticregressionforpositiveandunlabelleddata AT łazeckamałgorzata differentstrategiesoffittinglogisticregressionforpositiveandunlabelleddata |