Cargando…

Simple strategies for semi-supervised feature selection

What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this—how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and some unlabelled, we could assume the unlabelled data...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sechidis, Konstantinos, Brown, Gavin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2017
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954040/ https://www.ncbi.nlm.nih.gov/pubmed/31983804 http://dx.doi.org/10.1007/s10994-017-5648-2

_version_	1783486724361420800
author	Sechidis, Konstantinos Brown, Gavin
author_facet	Sechidis, Konstantinos Brown, Gavin
author_sort	Sechidis, Konstantinos
collection	PubMed
description	What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this—how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and some unlabelled, we could assume the unlabelled data are all positives, or assume them all negatives. These minimalist, seemingly naive, approaches have not previously been studied in depth. However, with theoretical and empirical studies, we show they provide powerful results for feature selection, via hypothesis testing and feature ranking. Combining them with some “soft” prior knowledge of the domain, we derive two novel algorithms (Semi-JMI, Semi-IAMB) that outperform significantly more complex competing methods, showing particularly good performance when the labels are missing-not-at-random. We conclude that simple approaches to this problem can work surprisingly well, and in many situations we can provably recover the exact feature selection dynamics, as if we had labelled the entire dataset.
format	Online Article Text
id	pubmed-6954040
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-69540402020-01-23 Simple strategies for semi-supervised feature selection Sechidis, Konstantinos Brown, Gavin Mach Learn Article What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this—how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and some unlabelled, we could assume the unlabelled data are all positives, or assume them all negatives. These minimalist, seemingly naive, approaches have not previously been studied in depth. However, with theoretical and empirical studies, we show they provide powerful results for feature selection, via hypothesis testing and feature ranking. Combining them with some “soft” prior knowledge of the domain, we derive two novel algorithms (Semi-JMI, Semi-IAMB) that outperform significantly more complex competing methods, showing particularly good performance when the labels are missing-not-at-random. We conclude that simple approaches to this problem can work surprisingly well, and in many situations we can provably recover the exact feature selection dynamics, as if we had labelled the entire dataset. Springer US 2017-07-17 2018 /pmc/articles/PMC6954040/ /pubmed/31983804 http://dx.doi.org/10.1007/s10994-017-5648-2 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Article Sechidis, Konstantinos Brown, Gavin Simple strategies for semi-supervised feature selection
title	Simple strategies for semi-supervised feature selection
title_full	Simple strategies for semi-supervised feature selection
title_fullStr	Simple strategies for semi-supervised feature selection
title_full_unstemmed	Simple strategies for semi-supervised feature selection
title_short	Simple strategies for semi-supervised feature selection
title_sort	simple strategies for semi-supervised feature selection
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954040/ https://www.ncbi.nlm.nih.gov/pubmed/31983804 http://dx.doi.org/10.1007/s10994-017-5648-2
work_keys_str_mv	AT sechidiskonstantinos simplestrategiesforsemisupervisedfeatureselection AT browngavin simplestrategiesforsemisupervisedfeatureselection

Simple strategies for semi-supervised feature selection

Ejemplares similares