Cargando…

Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling

INTRODUCTION: The generic metabolomics data processing workflow is constructed with a serial set of processes including peak picking, quality assurance, normalisation, missing value imputation, transformation and scaling. The combination of these processes should present the experimental data in an...

Descripción completa

Detalles Bibliográficos
Autores principales:	Di Guida, Riccardo, Engel, Jasper, Allwood, J. William, Weber, Ralf J. M., Jones, Martin R., Sommer, Ulf, Viant, Mark R., Dunn, Warwick B.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2016
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4831991/ https://www.ncbi.nlm.nih.gov/pubmed/27123000 http://dx.doi.org/10.1007/s11306-016-1030-9

_version_	1782427175668940800
author	Di Guida, Riccardo Engel, Jasper Allwood, J. William Weber, Ralf J. M. Jones, Martin R. Sommer, Ulf Viant, Mark R. Dunn, Warwick B.
author_facet	Di Guida, Riccardo Engel, Jasper Allwood, J. William Weber, Ralf J. M. Jones, Martin R. Sommer, Ulf Viant, Mark R. Dunn, Warwick B.
author_sort	Di Guida, Riccardo
collection	PubMed
description	INTRODUCTION: The generic metabolomics data processing workflow is constructed with a serial set of processes including peak picking, quality assurance, normalisation, missing value imputation, transformation and scaling. The combination of these processes should present the experimental data in an appropriate structure so to identify the biological changes in a valid and robust manner. OBJECTIVES: Currently, different researchers apply different data processing methods and no assessment of the permutations applied to UHPLC-MS datasets has been published. Here we wish to define the most appropriate data processing workflow. METHODS: We assess the influence of normalisation, missing value imputation, transformation and scaling methods on univariate and multivariate analysis of UHPLC-MS datasets acquired for different mammalian samples. RESULTS: Our studies have shown that once data are filtered, missing values are not correlated with m/z, retention time or response. Following an exhaustive evaluation, we recommend PQN normalisation with no missing value imputation and no transformation or scaling for univariate analysis. For PCA we recommend applying PQN normalisation with Random Forest missing value imputation, glog transformation and no scaling method. For PLS-DA we recommend PQN normalisation, KNN as the missing value imputation method, generalised logarithm transformation and no scaling. These recommendations are based on searching for the biologically important metabolite features independent of their measured abundance. CONCLUSION: The appropriate choice of normalisation, missing value imputation, transformation and scaling methods differs depending on the data analysis method and the choice of method is essential to maximise the biological derivations from UHPLC-MS datasets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11306-016-1030-9) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4831991
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-48319912016-04-25 Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling Di Guida, Riccardo Engel, Jasper Allwood, J. William Weber, Ralf J. M. Jones, Martin R. Sommer, Ulf Viant, Mark R. Dunn, Warwick B. Metabolomics Original Article INTRODUCTION: The generic metabolomics data processing workflow is constructed with a serial set of processes including peak picking, quality assurance, normalisation, missing value imputation, transformation and scaling. The combination of these processes should present the experimental data in an appropriate structure so to identify the biological changes in a valid and robust manner. OBJECTIVES: Currently, different researchers apply different data processing methods and no assessment of the permutations applied to UHPLC-MS datasets has been published. Here we wish to define the most appropriate data processing workflow. METHODS: We assess the influence of normalisation, missing value imputation, transformation and scaling methods on univariate and multivariate analysis of UHPLC-MS datasets acquired for different mammalian samples. RESULTS: Our studies have shown that once data are filtered, missing values are not correlated with m/z, retention time or response. Following an exhaustive evaluation, we recommend PQN normalisation with no missing value imputation and no transformation or scaling for univariate analysis. For PCA we recommend applying PQN normalisation with Random Forest missing value imputation, glog transformation and no scaling method. For PLS-DA we recommend PQN normalisation, KNN as the missing value imputation method, generalised logarithm transformation and no scaling. These recommendations are based on searching for the biologically important metabolite features independent of their measured abundance. CONCLUSION: The appropriate choice of normalisation, missing value imputation, transformation and scaling methods differs depending on the data analysis method and the choice of method is essential to maximise the biological derivations from UHPLC-MS datasets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11306-016-1030-9) contains supplementary material, which is available to authorized users. Springer US 2016-04-15 2016 /pmc/articles/PMC4831991/ /pubmed/27123000 http://dx.doi.org/10.1007/s11306-016-1030-9 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Original Article Di Guida, Riccardo Engel, Jasper Allwood, J. William Weber, Ralf J. M. Jones, Martin R. Sommer, Ulf Viant, Mark R. Dunn, Warwick B. Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling
title	Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling
title_full	Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling
title_fullStr	Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling
title_full_unstemmed	Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling
title_short	Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling
title_sort	non-targeted uhplc-ms metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4831991/ https://www.ncbi.nlm.nih.gov/pubmed/27123000 http://dx.doi.org/10.1007/s11306-016-1030-9
work_keys_str_mv	AT diguidariccardo nontargeteduhplcmsmetabolomicdataprocessingmethodsacomparativeinvestigationofnormalisationmissingvalueimputationtransformationandscaling AT engeljasper nontargeteduhplcmsmetabolomicdataprocessingmethodsacomparativeinvestigationofnormalisationmissingvalueimputationtransformationandscaling AT allwoodjwilliam nontargeteduhplcmsmetabolomicdataprocessingmethodsacomparativeinvestigationofnormalisationmissingvalueimputationtransformationandscaling AT weberralfjm nontargeteduhplcmsmetabolomicdataprocessingmethodsacomparativeinvestigationofnormalisationmissingvalueimputationtransformationandscaling AT jonesmartinr nontargeteduhplcmsmetabolomicdataprocessingmethodsacomparativeinvestigationofnormalisationmissingvalueimputationtransformationandscaling AT sommerulf nontargeteduhplcmsmetabolomicdataprocessingmethodsacomparativeinvestigationofnormalisationmissingvalueimputationtransformationandscaling AT viantmarkr nontargeteduhplcmsmetabolomicdataprocessingmethodsacomparativeinvestigationofnormalisationmissingvalueimputationtransformationandscaling AT dunnwarwickb nontargeteduhplcmsmetabolomicdataprocessingmethodsacomparativeinvestigationofnormalisationmissingvalueimputationtransformationandscaling

Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling

Ejemplares similares