Cargando…
Simple strategies for semi-supervised feature selection
What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this—how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and some unlabelled, we could assume the unlabelled data...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954040/ https://www.ncbi.nlm.nih.gov/pubmed/31983804 http://dx.doi.org/10.1007/s10994-017-5648-2 |
_version_ | 1783486724361420800 |
---|---|
author | Sechidis, Konstantinos Brown, Gavin |
author_facet | Sechidis, Konstantinos Brown, Gavin |
author_sort | Sechidis, Konstantinos |
collection | PubMed |
description | What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this—how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and some unlabelled, we could assume the unlabelled data are all positives, or assume them all negatives. These minimalist, seemingly naive, approaches have not previously been studied in depth. However, with theoretical and empirical studies, we show they provide powerful results for feature selection, via hypothesis testing and feature ranking. Combining them with some “soft” prior knowledge of the domain, we derive two novel algorithms (Semi-JMI, Semi-IAMB) that outperform significantly more complex competing methods, showing particularly good performance when the labels are missing-not-at-random. We conclude that simple approaches to this problem can work surprisingly well, and in many situations we can provably recover the exact feature selection dynamics, as if we had labelled the entire dataset. |
format | Online Article Text |
id | pubmed-6954040 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Springer US |
record_format | MEDLINE/PubMed |
spelling | pubmed-69540402020-01-23 Simple strategies for semi-supervised feature selection Sechidis, Konstantinos Brown, Gavin Mach Learn Article What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this—how much we can gain from two simple classifier-independent strategies. If we have some binary labelled data and some unlabelled, we could assume the unlabelled data are all positives, or assume them all negatives. These minimalist, seemingly naive, approaches have not previously been studied in depth. However, with theoretical and empirical studies, we show they provide powerful results for feature selection, via hypothesis testing and feature ranking. Combining them with some “soft” prior knowledge of the domain, we derive two novel algorithms (Semi-JMI, Semi-IAMB) that outperform significantly more complex competing methods, showing particularly good performance when the labels are missing-not-at-random. We conclude that simple approaches to this problem can work surprisingly well, and in many situations we can provably recover the exact feature selection dynamics, as if we had labelled the entire dataset. Springer US 2017-07-17 2018 /pmc/articles/PMC6954040/ /pubmed/31983804 http://dx.doi.org/10.1007/s10994-017-5648-2 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |
spellingShingle | Article Sechidis, Konstantinos Brown, Gavin Simple strategies for semi-supervised feature selection |
title | Simple strategies for semi-supervised feature selection |
title_full | Simple strategies for semi-supervised feature selection |
title_fullStr | Simple strategies for semi-supervised feature selection |
title_full_unstemmed | Simple strategies for semi-supervised feature selection |
title_short | Simple strategies for semi-supervised feature selection |
title_sort | simple strategies for semi-supervised feature selection |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954040/ https://www.ncbi.nlm.nih.gov/pubmed/31983804 http://dx.doi.org/10.1007/s10994-017-5648-2 |
work_keys_str_mv | AT sechidiskonstantinos simplestrategiesforsemisupervisedfeatureselection AT browngavin simplestrategiesforsemisupervisedfeatureselection |