Cargando…

Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data

In the paper we revisit the problem of fitting logistic regression to positive and unlabelled data. There are two key contributions. First, a new light is shed on the properties of frequently used naive method (in which unlabelled examples are treated as negative). In particular we show that naive m...

Descripción completa

Detalles Bibliográficos
Autores principales:	Teisseyre, Paweł, Mielniczuk, Jan, Łazęcka, Małgorzata
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303724/ http://dx.doi.org/10.1007/978-3-030-50423-6_1

_version_	1783548121313181696
author	Teisseyre, Paweł Mielniczuk, Jan Łazęcka, Małgorzata
author_facet	Teisseyre, Paweł Mielniczuk, Jan Łazęcka, Małgorzata
author_sort	Teisseyre, Paweł
collection	PubMed
description	In the paper we revisit the problem of fitting logistic regression to positive and unlabelled data. There are two key contributions. First, a new light is shed on the properties of frequently used naive method (in which unlabelled examples are treated as negative). In particular we show that naive method is related to incorrect specification of the logistic model and consequently the parameters in naive method are shrunk towards zero. An interesting relationship between shrinkage parameter and label frequency is established. Second, we introduce a novel method of fitting logistic model based on simultaneous estimation of vector of coefficients and label frequency. Importantly, the proposed method does not require prior estimation, which is a major obstacle in positive unlabelled learning. The method is superior in predicting posterior probability to both naive method and weighted likelihood method for several benchmark data sets. Moreover, it yields consistently better estimator of label frequency than other two known methods. We also introduce simple but powerful representation of positive and unlabelled data under Selected Completely at Random assumption which yields straightforwardly most properties of such model.
format	Online Article Text
id	pubmed-7303724
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-73037242020-06-19 Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data Teisseyre, Paweł Mielniczuk, Jan Łazęcka, Małgorzata Computational Science – ICCS 2020 Article In the paper we revisit the problem of fitting logistic regression to positive and unlabelled data. There are two key contributions. First, a new light is shed on the properties of frequently used naive method (in which unlabelled examples are treated as negative). In particular we show that naive method is related to incorrect specification of the logistic model and consequently the parameters in naive method are shrunk towards zero. An interesting relationship between shrinkage parameter and label frequency is established. Second, we introduce a novel method of fitting logistic model based on simultaneous estimation of vector of coefficients and label frequency. Importantly, the proposed method does not require prior estimation, which is a major obstacle in positive unlabelled learning. The method is superior in predicting posterior probability to both naive method and weighted likelihood method for several benchmark data sets. Moreover, it yields consistently better estimator of label frequency than other two known methods. We also introduce simple but powerful representation of positive and unlabelled data under Selected Completely at Random assumption which yields straightforwardly most properties of such model. 2020-05-23 /pmc/articles/PMC7303724/ http://dx.doi.org/10.1007/978-3-030-50423-6_1 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Teisseyre, Paweł Mielniczuk, Jan Łazęcka, Małgorzata Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
title	Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
title_full	Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
title_fullStr	Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
title_full_unstemmed	Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
title_short	Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data
title_sort	different strategies of fitting logistic regression for positive and unlabelled data
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303724/ http://dx.doi.org/10.1007/978-3-030-50423-6_1
work_keys_str_mv	AT teisseyrepaweł differentstrategiesoffittinglogisticregressionforpositiveandunlabelleddata AT mielniczukjan differentstrategiesoffittinglogisticregressionforpositiveandunlabelleddata AT łazeckamałgorzata differentstrategiesoffittinglogisticregressionforpositiveandunlabelleddata

Different Strategies of Fitting Logistic Regression for Positive and Unlabelled Data

Ejemplares similares