Cargando…

Autoencoders for sample size estimation for fully connected neural network classifiers

Sample size estimation is a crucial step in experimental design but is understudied in the context of deep learning. Currently, estimating the quantity of labeled data needed to train a classifier to a desired performance, is largely based on prior experience with similar models and problems or on u...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gulamali, Faris F., Sawant, Ashwin S., Kovatch, Patricia, Glicksberg, Benjamin, Charney, Alexander, Nadkarni, Girish N., Oermann, Eric
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9747810/ https://www.ncbi.nlm.nih.gov/pubmed/36513729 http://dx.doi.org/10.1038/s41746-022-00728-0

_version_	1784849686920691712
author	Gulamali, Faris F. Sawant, Ashwin S. Kovatch, Patricia Glicksberg, Benjamin Charney, Alexander Nadkarni, Girish N. Oermann, Eric
author_facet	Gulamali, Faris F. Sawant, Ashwin S. Kovatch, Patricia Glicksberg, Benjamin Charney, Alexander Nadkarni, Girish N. Oermann, Eric
author_sort	Gulamali, Faris F.
collection	PubMed
description	Sample size estimation is a crucial step in experimental design but is understudied in the context of deep learning. Currently, estimating the quantity of labeled data needed to train a classifier to a desired performance, is largely based on prior experience with similar models and problems or on untested heuristics. In many supervised machine learning applications, data labeling can be expensive and time-consuming and would benefit from a more rigorous means of estimating labeling requirements. Here, we study the problem of estimating the minimum sample size of labeled training data necessary for training computer vision models as an exemplar for other deep learning problems. We consider the problem of identifying the minimal number of labeled data points to achieve a generalizable representation of the data, a minimum converging sample (MCS). We use autoencoder loss to estimate the MCS for fully connected neural network classifiers. At sample sizes smaller than the MCS estimate, fully connected networks fail to distinguish classes, and at sample sizes above the MCS estimate, generalizability strongly correlates with the loss function of the autoencoder. We provide an easily accessible, code-free, and dataset-agnostic tool to estimate sample sizes for fully connected networks. Taken together, our findings suggest that MCS and convergence estimation are promising methods to guide sample size estimates for data collection and labeling prior to training deep learning models in computer vision.
format	Online Article Text
id	pubmed-9747810
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-97478102022-12-15 Autoencoders for sample size estimation for fully connected neural network classifiers Gulamali, Faris F. Sawant, Ashwin S. Kovatch, Patricia Glicksberg, Benjamin Charney, Alexander Nadkarni, Girish N. Oermann, Eric NPJ Digit Med Article Sample size estimation is a crucial step in experimental design but is understudied in the context of deep learning. Currently, estimating the quantity of labeled data needed to train a classifier to a desired performance, is largely based on prior experience with similar models and problems or on untested heuristics. In many supervised machine learning applications, data labeling can be expensive and time-consuming and would benefit from a more rigorous means of estimating labeling requirements. Here, we study the problem of estimating the minimum sample size of labeled training data necessary for training computer vision models as an exemplar for other deep learning problems. We consider the problem of identifying the minimal number of labeled data points to achieve a generalizable representation of the data, a minimum converging sample (MCS). We use autoencoder loss to estimate the MCS for fully connected neural network classifiers. At sample sizes smaller than the MCS estimate, fully connected networks fail to distinguish classes, and at sample sizes above the MCS estimate, generalizability strongly correlates with the loss function of the autoencoder. We provide an easily accessible, code-free, and dataset-agnostic tool to estimate sample sizes for fully connected networks. Taken together, our findings suggest that MCS and convergence estimation are promising methods to guide sample size estimates for data collection and labeling prior to training deep learning models in computer vision. Nature Publishing Group UK 2022-12-13 /pmc/articles/PMC9747810/ /pubmed/36513729 http://dx.doi.org/10.1038/s41746-022-00728-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Gulamali, Faris F. Sawant, Ashwin S. Kovatch, Patricia Glicksberg, Benjamin Charney, Alexander Nadkarni, Girish N. Oermann, Eric Autoencoders for sample size estimation for fully connected neural network classifiers
title	Autoencoders for sample size estimation for fully connected neural network classifiers
title_full	Autoencoders for sample size estimation for fully connected neural network classifiers
title_fullStr	Autoencoders for sample size estimation for fully connected neural network classifiers
title_full_unstemmed	Autoencoders for sample size estimation for fully connected neural network classifiers
title_short	Autoencoders for sample size estimation for fully connected neural network classifiers
title_sort	autoencoders for sample size estimation for fully connected neural network classifiers
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9747810/ https://www.ncbi.nlm.nih.gov/pubmed/36513729 http://dx.doi.org/10.1038/s41746-022-00728-0
work_keys_str_mv	AT gulamalifarisf autoencodersforsamplesizeestimationforfullyconnectedneuralnetworkclassifiers AT sawantashwins autoencodersforsamplesizeestimationforfullyconnectedneuralnetworkclassifiers AT kovatchpatricia autoencodersforsamplesizeestimationforfullyconnectedneuralnetworkclassifiers AT glicksbergbenjamin autoencodersforsamplesizeestimationforfullyconnectedneuralnetworkclassifiers AT charneyalexander autoencodersforsamplesizeestimationforfullyconnectedneuralnetworkclassifiers AT nadkarnigirishn autoencodersforsamplesizeestimationforfullyconnectedneuralnetworkclassifiers AT oermanneric autoencodersforsamplesizeestimationforfullyconnectedneuralnetworkclassifiers

Autoencoders for sample size estimation for fully connected neural network classifiers

Ejemplares similares