Cargando…

Discriminative machine learning for maximal representative subsampling

Biased population samples pose a prevalent problem in the social sciences. Therefore, we present two novel methods that are based on positive-unlabeled learning to mitigate bias. Both methods leverage auxiliary information from a representative data set and train machine learning classifiers to dete...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hauptmann, Tony, Fellenz, Sophie, Nathan, Laksan, Tüscher, Oliver, Kramer, Stefan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10684887/ https://www.ncbi.nlm.nih.gov/pubmed/38017053 http://dx.doi.org/10.1038/s41598-023-48177-3

_version_	1785151505482907648
author	Hauptmann, Tony Fellenz, Sophie Nathan, Laksan Tüscher, Oliver Kramer, Stefan
author_facet	Hauptmann, Tony Fellenz, Sophie Nathan, Laksan Tüscher, Oliver Kramer, Stefan
author_sort	Hauptmann, Tony
collection	PubMed
description	Biased population samples pose a prevalent problem in the social sciences. Therefore, we present two novel methods that are based on positive-unlabeled learning to mitigate bias. Both methods leverage auxiliary information from a representative data set and train machine learning classifiers to determine the sample weights. The first method, named maximum representative subsampling (MRS), uses a classifier to iteratively remove instances, by assigning a sample weight of 0, from the biased data set until it aligns with the representative one. The second method is a variant of MRS – Soft-MRS – that iteratively adapts sample weights instead of removing samples completely. To assess the effectiveness of our approach, we induced artificial bias in a public census data set and examined the corrected estimates. We compare the performance of our methods against existing techniques, evaluating the ability of sample weights created with Soft-MRS or MRS to minimize differences and improve downstream classification tasks. Lastly, we demonstrate the applicability of the proposed methods in a real-world study of resilience research, exploring the influence of resilience on voting behavior. Through our work, we address the issue of bias in social science, amongst others, and provide a versatile methodology for bias reduction based on machine learning. Based on our experiments, we recommend to use MRS for downstream classification tasks and Soft-MRS for downstream tasks where the relative bias of the dependent variable is relevant.
format	Online Article Text
id	pubmed-10684887
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-106848872023-11-30 Discriminative machine learning for maximal representative subsampling Hauptmann, Tony Fellenz, Sophie Nathan, Laksan Tüscher, Oliver Kramer, Stefan Sci Rep Article Biased population samples pose a prevalent problem in the social sciences. Therefore, we present two novel methods that are based on positive-unlabeled learning to mitigate bias. Both methods leverage auxiliary information from a representative data set and train machine learning classifiers to determine the sample weights. The first method, named maximum representative subsampling (MRS), uses a classifier to iteratively remove instances, by assigning a sample weight of 0, from the biased data set until it aligns with the representative one. The second method is a variant of MRS – Soft-MRS – that iteratively adapts sample weights instead of removing samples completely. To assess the effectiveness of our approach, we induced artificial bias in a public census data set and examined the corrected estimates. We compare the performance of our methods against existing techniques, evaluating the ability of sample weights created with Soft-MRS or MRS to minimize differences and improve downstream classification tasks. Lastly, we demonstrate the applicability of the proposed methods in a real-world study of resilience research, exploring the influence of resilience on voting behavior. Through our work, we address the issue of bias in social science, amongst others, and provide a versatile methodology for bias reduction based on machine learning. Based on our experiments, we recommend to use MRS for downstream classification tasks and Soft-MRS for downstream tasks where the relative bias of the dependent variable is relevant. Nature Publishing Group UK 2023-11-27 /pmc/articles/PMC10684887/ /pubmed/38017053 http://dx.doi.org/10.1038/s41598-023-48177-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Hauptmann, Tony Fellenz, Sophie Nathan, Laksan Tüscher, Oliver Kramer, Stefan Discriminative machine learning for maximal representative subsampling
title	Discriminative machine learning for maximal representative subsampling
title_full	Discriminative machine learning for maximal representative subsampling
title_fullStr	Discriminative machine learning for maximal representative subsampling
title_full_unstemmed	Discriminative machine learning for maximal representative subsampling
title_short	Discriminative machine learning for maximal representative subsampling
title_sort	discriminative machine learning for maximal representative subsampling
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10684887/ https://www.ncbi.nlm.nih.gov/pubmed/38017053 http://dx.doi.org/10.1038/s41598-023-48177-3
work_keys_str_mv	AT hauptmanntony discriminativemachinelearningformaximalrepresentativesubsampling AT fellenzsophie discriminativemachinelearningformaximalrepresentativesubsampling AT nathanlaksan discriminativemachinelearningformaximalrepresentativesubsampling AT tuscheroliver discriminativemachinelearningformaximalrepresentativesubsampling AT kramerstefan discriminativemachinelearningformaximalrepresentativesubsampling

Discriminative machine learning for maximal representative subsampling

Ejemplares similares