Cargando…

ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data

BACKGROUND: Classification algorithms assign observations to groups based on patterns in data. The machine-learning community have developed myriad classification algorithms, which are used in diverse life science research domains. Algorithm choice can affect classification accuracy dramatically, so...

Descripción completa

Detalles Bibliográficos
Autores principales:	Piccolo, Stephen R, Lee, Terry J, Suh, Erica, Hill, Kimball
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7131989/ https://www.ncbi.nlm.nih.gov/pubmed/32249316 http://dx.doi.org/10.1093/gigascience/giaa026

_version_	1783517358523940864
author	Piccolo, Stephen R Lee, Terry J Suh, Erica Hill, Kimball
author_facet	Piccolo, Stephen R Lee, Terry J Suh, Erica Hill, Kimball
author_sort	Piccolo, Stephen R
collection	PubMed
description	BACKGROUND: Classification algorithms assign observations to groups based on patterns in data. The machine-learning community have developed myriad classification algorithms, which are used in diverse life science research domains. Algorithm choice can affect classification accuracy dramatically, so it is crucial that researchers optimize the choice of which algorithm(s) to apply in a given research domain on the basis of empirical evidence. In benchmark studies, multiple algorithms are applied to multiple datasets, and the researcher examines overall trends. In addition, the researcher may evaluate multiple hyperparameter combinations for each algorithm and use feature selection to reduce data dimensionality. Although software implementations of classification algorithms are widely available, robust benchmark comparisons are difficult to perform when researchers wish to compare algorithms that span multiple software packages. Programming interfaces, data formats, and evaluation procedures differ across software packages; and dependency conflicts may arise during installation. FINDINGS: To address these challenges, we created ShinyLearner, an open-source project for integrating machine-learning packages into software containers. ShinyLearner provides a uniform interface for performing classification, irrespective of the library that implements each algorithm, thus facilitating benchmark comparisons. In addition, ShinyLearner enables researchers to optimize hyperparameters and select features via nested cross-validation; it tracks all nested operations and generates output files that make these steps transparent. ShinyLearner includes a Web interface to help users more easily construct the commands necessary to perform benchmark comparisons. ShinyLearner is freely available at https://github.com/srp33/ShinyLearner. CONCLUSIONS: This software is a resource to researchers who wish to benchmark multiple classification or feature-selection algorithms on a given dataset. We hope it will serve as example of combining the benefits of software containerization with a user-friendly approach.
format	Online Article Text
id	pubmed-7131989
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-71319892020-04-09 ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data Piccolo, Stephen R Lee, Terry J Suh, Erica Hill, Kimball Gigascience Technical Note BACKGROUND: Classification algorithms assign observations to groups based on patterns in data. The machine-learning community have developed myriad classification algorithms, which are used in diverse life science research domains. Algorithm choice can affect classification accuracy dramatically, so it is crucial that researchers optimize the choice of which algorithm(s) to apply in a given research domain on the basis of empirical evidence. In benchmark studies, multiple algorithms are applied to multiple datasets, and the researcher examines overall trends. In addition, the researcher may evaluate multiple hyperparameter combinations for each algorithm and use feature selection to reduce data dimensionality. Although software implementations of classification algorithms are widely available, robust benchmark comparisons are difficult to perform when researchers wish to compare algorithms that span multiple software packages. Programming interfaces, data formats, and evaluation procedures differ across software packages; and dependency conflicts may arise during installation. FINDINGS: To address these challenges, we created ShinyLearner, an open-source project for integrating machine-learning packages into software containers. ShinyLearner provides a uniform interface for performing classification, irrespective of the library that implements each algorithm, thus facilitating benchmark comparisons. In addition, ShinyLearner enables researchers to optimize hyperparameters and select features via nested cross-validation; it tracks all nested operations and generates output files that make these steps transparent. ShinyLearner includes a Web interface to help users more easily construct the commands necessary to perform benchmark comparisons. ShinyLearner is freely available at https://github.com/srp33/ShinyLearner. CONCLUSIONS: This software is a resource to researchers who wish to benchmark multiple classification or feature-selection algorithms on a given dataset. We hope it will serve as example of combining the benefits of software containerization with a user-friendly approach. Oxford University Press 2020-04-06 /pmc/articles/PMC7131989/ /pubmed/32249316 http://dx.doi.org/10.1093/gigascience/giaa026 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Technical Note Piccolo, Stephen R Lee, Terry J Suh, Erica Hill, Kimball ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data
title	ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data
title_full	ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data
title_fullStr	ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data
title_full_unstemmed	ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data
title_short	ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data
title_sort	shinylearner: a containerized benchmarking tool for machine-learning classification of tabular data
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7131989/ https://www.ncbi.nlm.nih.gov/pubmed/32249316 http://dx.doi.org/10.1093/gigascience/giaa026
work_keys_str_mv	AT piccolostephenr shinylearneracontainerizedbenchmarkingtoolformachinelearningclassificationoftabulardata AT leeterryj shinylearneracontainerizedbenchmarkingtoolformachinelearningclassificationoftabulardata AT suherica shinylearneracontainerizedbenchmarkingtoolformachinelearningclassificationoftabulardata AT hillkimball shinylearneracontainerizedbenchmarkingtoolformachinelearningclassificationoftabulardata

ShinyLearner: A containerized benchmarking tool for machine-learning classification of tabular data

Ejemplares similares