Cargando…

Synthetic data for design and evaluation of binary classifiers in the context of Bayesian transfer learning

Transfer learning (TL) techniques can enable effective learning in data scarce domains by allowing one to re-purpose data or scientific knowledge available in relevant source domains for predictive tasks in a target domain of interest. In this Data in Brief article, we present a synthetic dataset fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Maddouri, Omar, Qian, Xiaoning, Alexander, Francis J., Dougherty, Edward R., Yoon, Byung-Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9011006/
https://www.ncbi.nlm.nih.gov/pubmed/35434232
http://dx.doi.org/10.1016/j.dib.2022.108113
_version_ 1784687598667563008
author Maddouri, Omar
Qian, Xiaoning
Alexander, Francis J.
Dougherty, Edward R.
Yoon, Byung-Jun
author_facet Maddouri, Omar
Qian, Xiaoning
Alexander, Francis J.
Dougherty, Edward R.
Yoon, Byung-Jun
author_sort Maddouri, Omar
collection PubMed
description Transfer learning (TL) techniques can enable effective learning in data scarce domains by allowing one to re-purpose data or scientific knowledge available in relevant source domains for predictive tasks in a target domain of interest. In this Data in Brief article, we present a synthetic dataset for binary classification in the context of Bayesian transfer learning, which can be used for the design and evaluation of TL-based classifiers. For this purpose, we consider numerous combinations of classification settings, based on which we simulate a diverse set of feature-label distributions with varying learning complexity. For each set of model parameters, we provide a pair of target and source datasets that have been jointly sampled from the underlying feature-label distributions in the target and source domains, respectively. For both target and source domains, the data in a given class and domain are normally distributed, where the distributions across domains are related to each other through a joint prior. To ensure the consistency of the classification complexity across the provided datasets, we have controlled the Bayes error such that it is maintained within a range of predefined values that mimic realistic classification scenarios across different relatedness levels. The provided datasets may serve as useful resources for designing and benchmarking transfer learning schemes for binary classification as well as the estimation of classification error.
format Online
Article
Text
id pubmed-9011006
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-90110062022-04-16 Synthetic data for design and evaluation of binary classifiers in the context of Bayesian transfer learning Maddouri, Omar Qian, Xiaoning Alexander, Francis J. Dougherty, Edward R. Yoon, Byung-Jun Data Brief Data Article Transfer learning (TL) techniques can enable effective learning in data scarce domains by allowing one to re-purpose data or scientific knowledge available in relevant source domains for predictive tasks in a target domain of interest. In this Data in Brief article, we present a synthetic dataset for binary classification in the context of Bayesian transfer learning, which can be used for the design and evaluation of TL-based classifiers. For this purpose, we consider numerous combinations of classification settings, based on which we simulate a diverse set of feature-label distributions with varying learning complexity. For each set of model parameters, we provide a pair of target and source datasets that have been jointly sampled from the underlying feature-label distributions in the target and source domains, respectively. For both target and source domains, the data in a given class and domain are normally distributed, where the distributions across domains are related to each other through a joint prior. To ensure the consistency of the classification complexity across the provided datasets, we have controlled the Bayes error such that it is maintained within a range of predefined values that mimic realistic classification scenarios across different relatedness levels. The provided datasets may serve as useful resources for designing and benchmarking transfer learning schemes for binary classification as well as the estimation of classification error. Elsevier 2022-04-02 /pmc/articles/PMC9011006/ /pubmed/35434232 http://dx.doi.org/10.1016/j.dib.2022.108113 Text en © 2022 https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Data Article
Maddouri, Omar
Qian, Xiaoning
Alexander, Francis J.
Dougherty, Edward R.
Yoon, Byung-Jun
Synthetic data for design and evaluation of binary classifiers in the context of Bayesian transfer learning
title Synthetic data for design and evaluation of binary classifiers in the context of Bayesian transfer learning
title_full Synthetic data for design and evaluation of binary classifiers in the context of Bayesian transfer learning
title_fullStr Synthetic data for design and evaluation of binary classifiers in the context of Bayesian transfer learning
title_full_unstemmed Synthetic data for design and evaluation of binary classifiers in the context of Bayesian transfer learning
title_short Synthetic data for design and evaluation of binary classifiers in the context of Bayesian transfer learning
title_sort synthetic data for design and evaluation of binary classifiers in the context of bayesian transfer learning
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9011006/
https://www.ncbi.nlm.nih.gov/pubmed/35434232
http://dx.doi.org/10.1016/j.dib.2022.108113
work_keys_str_mv AT maddouriomar syntheticdatafordesignandevaluationofbinaryclassifiersinthecontextofbayesiantransferlearning
AT qianxiaoning syntheticdatafordesignandevaluationofbinaryclassifiersinthecontextofbayesiantransferlearning
AT alexanderfrancisj syntheticdatafordesignandevaluationofbinaryclassifiersinthecontextofbayesiantransferlearning
AT doughertyedwardr syntheticdatafordesignandevaluationofbinaryclassifiersinthecontextofbayesiantransferlearning
AT yoonbyungjun syntheticdatafordesignandevaluationofbinaryclassifiersinthecontextofbayesiantransferlearning