Cargando…

Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding

Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate m...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Yun, Muhamadali, Howbeer, Sayqal, Ali, Dixon, Neil, Goodacre, Royston
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5192444/
https://www.ncbi.nlm.nih.gov/pubmed/27801817
http://dx.doi.org/10.3390/metabo6040038
_version_ 1782487777852522496
author Xu, Yun
Muhamadali, Howbeer
Sayqal, Ali
Dixon, Neil
Goodacre, Royston
author_facet Xu, Yun
Muhamadali, Howbeer
Sayqal, Ali
Dixon, Neil
Goodacre, Royston
author_sort Xu, Yun
collection PubMed
description Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a “pure” regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding.
format Online
Article
Text
id pubmed-5192444
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-51924442017-01-03 Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding Xu, Yun Muhamadali, Howbeer Sayqal, Ali Dixon, Neil Goodacre, Royston Metabolites Article Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a “pure” regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding. MDPI 2016-10-28 /pmc/articles/PMC5192444/ /pubmed/27801817 http://dx.doi.org/10.3390/metabo6040038 Text en © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Xu, Yun
Muhamadali, Howbeer
Sayqal, Ali
Dixon, Neil
Goodacre, Royston
Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding
title Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding
title_full Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding
title_fullStr Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding
title_full_unstemmed Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding
title_short Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding
title_sort partial least squares with structured output for modelling the metabolomics data obtained from complex experimental designs: a study into the y-block coding
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5192444/
https://www.ncbi.nlm.nih.gov/pubmed/27801817
http://dx.doi.org/10.3390/metabo6040038
work_keys_str_mv AT xuyun partialleastsquareswithstructuredoutputformodellingthemetabolomicsdataobtainedfromcomplexexperimentaldesignsastudyintotheyblockcoding
AT muhamadalihowbeer partialleastsquareswithstructuredoutputformodellingthemetabolomicsdataobtainedfromcomplexexperimentaldesignsastudyintotheyblockcoding
AT sayqalali partialleastsquareswithstructuredoutputformodellingthemetabolomicsdataobtainedfromcomplexexperimentaldesignsastudyintotheyblockcoding
AT dixonneil partialleastsquareswithstructuredoutputformodellingthemetabolomicsdataobtainedfromcomplexexperimentaldesignsastudyintotheyblockcoding
AT goodacreroyston partialleastsquareswithstructuredoutputformodellingthemetabolomicsdataobtainedfromcomplexexperimentaldesignsastudyintotheyblockcoding