Cargando…

Simple strategies for semi-supervised feature selection

What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this—how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and some unlabelled, we could assume the unlabelled data...

Descripción completa

Detalles Bibliográficos
Autores principales: Sechidis, Konstantinos, Brown, Gavin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954040/
https://www.ncbi.nlm.nih.gov/pubmed/31983804
http://dx.doi.org/10.1007/s10994-017-5648-2
_version_ 1783486724361420800
author Sechidis, Konstantinos
Brown, Gavin
author_facet Sechidis, Konstantinos
Brown, Gavin
author_sort Sechidis, Konstantinos
collection PubMed
description What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this—how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and some unlabelled, we could assume the unlabelled data are all positives, or assume them all negatives. These minimalist, seemingly naive, approaches have not previously been studied in depth. However, with theoretical and empirical studies, we show they provide powerful results for feature selection, via hypothesis testing and feature ranking. Combining them with some “soft” prior knowledge of the domain, we derive two novel algorithms (Semi-JMI, Semi-IAMB) that outperform significantly more complex competing methods, showing particularly good performance when the labels are missing-not-at-random. We conclude that simple approaches to this problem can work surprisingly well, and in many situations we can provably recover the exact feature selection dynamics, as if we had labelled the entire dataset.
format Online
Article
Text
id pubmed-6954040
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-69540402020-01-23 Simple strategies for semi-supervised feature selection Sechidis, Konstantinos Brown, Gavin Mach Learn Article What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this—how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and some unlabelled, we could assume the unlabelled data are all positives, or assume them all negatives. These minimalist, seemingly naive, approaches have not previously been studied in depth. However, with theoretical and empirical studies, we show they provide powerful results for feature selection, via hypothesis testing and feature ranking. Combining them with some “soft” prior knowledge of the domain, we derive two novel algorithms (Semi-JMI, Semi-IAMB) that outperform significantly more complex competing methods, showing particularly good performance when the labels are missing-not-at-random. We conclude that simple approaches to this problem can work surprisingly well, and in many situations we can provably recover the exact feature selection dynamics, as if we had labelled the entire dataset. Springer US 2017-07-17 2018 /pmc/articles/PMC6954040/ /pubmed/31983804 http://dx.doi.org/10.1007/s10994-017-5648-2 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Article
Sechidis, Konstantinos
Brown, Gavin
Simple strategies for semi-supervised feature selection
title Simple strategies for semi-supervised feature selection
title_full Simple strategies for semi-supervised feature selection
title_fullStr Simple strategies for semi-supervised feature selection
title_full_unstemmed Simple strategies for semi-supervised feature selection
title_short Simple strategies for semi-supervised feature selection
title_sort simple strategies for semi-supervised feature selection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954040/
https://www.ncbi.nlm.nih.gov/pubmed/31983804
http://dx.doi.org/10.1007/s10994-017-5648-2
work_keys_str_mv AT sechidiskonstantinos simplestrategiesforsemisupervisedfeatureselection
AT browngavin simplestrategiesforsemisupervisedfeatureselection