Cargando…

A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification

INTRODUCTION: Metabolomics is increasingly being used in the clinical setting for disease diagnosis, prognosis and risk prediction. Machine learning algorithms are particularly important in the construction of multivariate metabolite prediction. Historically, partial least squares (PLS) regression h...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mendez, Kevin M., Reinke, Stacey N., Broadhurst, David I.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2019
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6856029/ https://www.ncbi.nlm.nih.gov/pubmed/31728648 http://dx.doi.org/10.1007/s11306-019-1612-4

_version_	1783470490517504000
author	Mendez, Kevin M. Reinke, Stacey N. Broadhurst, David I.
author_facet	Mendez, Kevin M. Reinke, Stacey N. Broadhurst, David I.
author_sort	Mendez, Kevin M.
collection	PubMed
description	INTRODUCTION: Metabolomics is increasingly being used in the clinical setting for disease diagnosis, prognosis and risk prediction. Machine learning algorithms are particularly important in the construction of multivariate metabolite prediction. Historically, partial least squares (PLS) regression has been the gold standard for binary classification. Nonlinear machine learning methods such as random forests (RF), kernel support vector machines (SVM) and artificial neural networks (ANN) may be more suited to modelling possible nonlinear metabolite covariance, and thus provide better predictive models. OBJECTIVES: We hypothesise that for binary classification using metabolomics data, non-linear machine learning methods will provide superior generalised predictive ability when compared to linear alternatives, in particular when compared with the current gold standard PLS discriminant analysis. METHODS: We compared the general predictive performance of eight archetypal machine learning algorithms across ten publicly available clinical metabolomics data sets. The algorithms were implemented in the Python programming language. All code and results have been made publicly available as Jupyter notebooks. RESULTS: There was only marginal improvement in predictive ability for SVM and ANN over PLS across all data sets. RF performance was comparatively poor. The use of out-of-bag bootstrap confidence intervals provided a measure of uncertainty of model prediction such that the quality of metabolomics data was observed to be a bigger influence on generalised performance than model choice. CONCLUSION: The size of the data set, and choice of performance metric, had a greater influence on generalised predictive performance than the choice of machine learning algorithm. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s11306-019-1612-4) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6856029
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-68560292019-12-03 A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification Mendez, Kevin M. Reinke, Stacey N. Broadhurst, David I. Metabolomics Original Article INTRODUCTION: Metabolomics is increasingly being used in the clinical setting for disease diagnosis, prognosis and risk prediction. Machine learning algorithms are particularly important in the construction of multivariate metabolite prediction. Historically, partial least squares (PLS) regression has been the gold standard for binary classification. Nonlinear machine learning methods such as random forests (RF), kernel support vector machines (SVM) and artificial neural networks (ANN) may be more suited to modelling possible nonlinear metabolite covariance, and thus provide better predictive models. OBJECTIVES: We hypothesise that for binary classification using metabolomics data, non-linear machine learning methods will provide superior generalised predictive ability when compared to linear alternatives, in particular when compared with the current gold standard PLS discriminant analysis. METHODS: We compared the general predictive performance of eight archetypal machine learning algorithms across ten publicly available clinical metabolomics data sets. The algorithms were implemented in the Python programming language. All code and results have been made publicly available as Jupyter notebooks. RESULTS: There was only marginal improvement in predictive ability for SVM and ANN over PLS across all data sets. RF performance was comparatively poor. The use of out-of-bag bootstrap confidence intervals provided a measure of uncertainty of model prediction such that the quality of metabolomics data was observed to be a bigger influence on generalised performance than model choice. CONCLUSION: The size of the data set, and choice of performance metric, had a greater influence on generalised predictive performance than the choice of machine learning algorithm. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s11306-019-1612-4) contains supplementary material, which is available to authorized users. Springer US 2019-11-15 2019 /pmc/articles/PMC6856029/ /pubmed/31728648 http://dx.doi.org/10.1007/s11306-019-1612-4 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Original Article Mendez, Kevin M. Reinke, Stacey N. Broadhurst, David I. A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification
title	A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification
title_full	A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification
title_fullStr	A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification
title_full_unstemmed	A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification
title_short	A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification
title_sort	comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6856029/ https://www.ncbi.nlm.nih.gov/pubmed/31728648 http://dx.doi.org/10.1007/s11306-019-1612-4
work_keys_str_mv	AT mendezkevinm acomparativeevaluationofthegeneralisedpredictiveabilityofeightmachinelearningalgorithmsacrosstenclinicalmetabolomicsdatasetsforbinaryclassification AT reinkestaceyn acomparativeevaluationofthegeneralisedpredictiveabilityofeightmachinelearningalgorithmsacrosstenclinicalmetabolomicsdatasetsforbinaryclassification AT broadhurstdavidi acomparativeevaluationofthegeneralisedpredictiveabilityofeightmachinelearningalgorithmsacrosstenclinicalmetabolomicsdatasetsforbinaryclassification AT mendezkevinm comparativeevaluationofthegeneralisedpredictiveabilityofeightmachinelearningalgorithmsacrosstenclinicalmetabolomicsdatasetsforbinaryclassification AT reinkestaceyn comparativeevaluationofthegeneralisedpredictiveabilityofeightmachinelearningalgorithmsacrosstenclinicalmetabolomicsdatasetsforbinaryclassification AT broadhurstdavidi comparativeevaluationofthegeneralisedpredictiveabilityofeightmachinelearningalgorithmsacrosstenclinicalmetabolomicsdatasetsforbinaryclassification

A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification

Ejemplares similares