Cargando…

Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data

In the paper we revisit the problem of fitting logistic regression to positive and unlabelled data. There are two key contributions. First, a new light is shed on the properties of frequently used naive method (in which unlabelled examples are treated as negative). In particular we show that naive m...

Descripción completa

Detalles Bibliográficos
Autores principales: Teisseyre, Paweł, Mielniczuk, Jan, Łazęcka, Małgorzata
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303724/
http://dx.doi.org/10.1007/978-3-030-50423-6_1
_version_ 1783548121313181696
author Teisseyre, Paweł
Mielniczuk, Jan
Łazęcka, Małgorzata
author_facet Teisseyre, Paweł
Mielniczuk, Jan
Łazęcka, Małgorzata
author_sort Teisseyre, Paweł
collection PubMed
description In the paper we revisit the problem of fitting logistic regression to positive and unlabelled data. There are two key contributions. First, a new light is shed on the properties of frequently used naive method (in which unlabelled examples are treated as negative). In particular we show that naive method is related to incorrect specification of the logistic model and consequently the parameters in naive method are shrunk towards zero. An interesting relationship between shrinkage parameter and label frequency is established. Second, we introduce a novel method of fitting logistic model based on simultaneous estimation of vector of coefficients and label frequency. Importantly, the proposed method does not require prior estimation, which is a major obstacle in positive unlabelled learning. The method is superior in predicting posterior probability to both naive method and weighted likelihood method for several benchmark data sets. Moreover, it yields consistently better estimator of label frequency than other two known methods. We also introduce simple but powerful representation of positive and unlabelled data under Selected Completely at Random assumption which yields straightforwardly most properties of such model.
format Online
Article
Text
id pubmed-7303724
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73037242020-06-19 Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data Teisseyre, Paweł Mielniczuk, Jan Łazęcka, Małgorzata Computational Science – ICCS 2020 Article In the paper we revisit the problem of fitting logistic regression to positive and unlabelled data. There are two key contributions. First, a new light is shed on the properties of frequently used naive method (in which unlabelled examples are treated as negative). In particular we show that naive method is related to incorrect specification of the logistic model and consequently the parameters in naive method are shrunk towards zero. An interesting relationship between shrinkage parameter and label frequency is established. Second, we introduce a novel method of fitting logistic model based on simultaneous estimation of vector of coefficients and label frequency. Importantly, the proposed method does not require prior estimation, which is a major obstacle in positive unlabelled learning. The method is superior in predicting posterior probability to both naive method and weighted likelihood method for several benchmark data sets. Moreover, it yields consistently better estimator of label frequency than other two known methods. We also introduce simple but powerful representation of positive and unlabelled data under Selected Completely at Random assumption which yields straightforwardly most properties of such model. 2020-05-23 /pmc/articles/PMC7303724/ http://dx.doi.org/10.1007/978-3-030-50423-6_1 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Teisseyre, Paweł
Mielniczuk, Jan
Łazęcka, Małgorzata
Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
title Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
title_full Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
title_fullStr Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
title_full_unstemmed Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
title_short Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
title_sort different strategies of fitting logistic regression for positive and unlabelled data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303724/
http://dx.doi.org/10.1007/978-3-030-50423-6_1
work_keys_str_mv AT teisseyrepaweł differentstrategiesoffittinglogisticregressionforpositiveandunlabelleddata
AT mielniczukjan differentstrategiesoffittinglogisticregressionforpositiveandunlabelleddata
AT łazeckamałgorzata differentstrategiesoffittinglogisticregressionforpositiveandunlabelleddata