Cargando…

Centering, scaling, and transformations: improving the biological information content of metabolomics data

BACKGROUND: Extracting relevant biological information from large data sets is a major challenge in functional genomics research. Different aspects of the data hamper their biological interpretation. For instance, 5000-fold differences in concentration for different metabolites are present in a meta...

Descripción completa

Detalles Bibliográficos
Autores principales:	van den Berg, Robert A, Hoefsloot, Huub CJ, Westerhuis, Johan A, Smilde, Age K, van der Werf, Mariët J
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1534033/ https://www.ncbi.nlm.nih.gov/pubmed/16762068 http://dx.doi.org/10.1186/1471-2164-7-142

_version_	1782129091476979712
author	van den Berg, Robert A Hoefsloot, Huub CJ Westerhuis, Johan A Smilde, Age K van der Werf, Mariët J
author_facet	van den Berg, Robert A Hoefsloot, Huub CJ Westerhuis, Johan A Smilde, Age K van der Werf, Mariët J
author_sort	van den Berg, Robert A
collection	PubMed
description	BACKGROUND: Extracting relevant biological information from large data sets is a major challenge in functional genomics research. Different aspects of the data hamper their biological interpretation. For instance, 5000-fold differences in concentration for different metabolites are present in a metabolomics data set, while these differences are not proportional to the biological relevance of these metabolites. However, data analysis methods are not able to make this distinction. Data pretreatment methods can correct for aspects that hinder the biological interpretation of metabolomics data sets by emphasizing the biological information in the data set and thus improving their biological interpretability. RESULTS: Different data pretreatment methods, i.e. centering, autoscaling, pareto scaling, range scaling, vast scaling, log transformation, and power transformation, were tested on a real-life metabolomics data set. They were found to greatly affect the outcome of the data analysis and thus the rank of the, from a biological point of view, most important metabolites. Furthermore, the stability of the rank, the influence of technical errors on data analysis, and the preference of data analysis methods for selecting highly abundant metabolites were affected by the data pretreatment method used prior to data analysis. CONCLUSION: Different pretreatment methods emphasize different aspects of the data and each pretreatment method has its own merits and drawbacks. The choice for a pretreatment method depends on the biological question to be answered, the properties of the data set and the data analysis method selected. For the explorative analysis of the validation data set used in this study, autoscaling and range scaling performed better than the other pretreatment methods. That is, range scaling and autoscaling were able to remove the dependence of the rank of the metabolites on the average concentration and the magnitude of the fold changes and showed biologically sensible results after PCA (principal component analysis). In conclusion, selecting a proper data pretreatment method is an essential step in the analysis of metabolomics data and greatly affects the metabolites that are identified to be the most important.
format	Text
id	pubmed-1534033
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-15340332006-08-10 Centering, scaling, and transformations: improving the biological information content of metabolomics data van den Berg, Robert A Hoefsloot, Huub CJ Westerhuis, Johan A Smilde, Age K van der Werf, Mariët J BMC Genomics Research Article BACKGROUND: Extracting relevant biological information from large data sets is a major challenge in functional genomics research. Different aspects of the data hamper their biological interpretation. For instance, 5000-fold differences in concentration for different metabolites are present in a metabolomics data set, while these differences are not proportional to the biological relevance of these metabolites. However, data analysis methods are not able to make this distinction. Data pretreatment methods can correct for aspects that hinder the biological interpretation of metabolomics data sets by emphasizing the biological information in the data set and thus improving their biological interpretability. RESULTS: Different data pretreatment methods, i.e. centering, autoscaling, pareto scaling, range scaling, vast scaling, log transformation, and power transformation, were tested on a real-life metabolomics data set. They were found to greatly affect the outcome of the data analysis and thus the rank of the, from a biological point of view, most important metabolites. Furthermore, the stability of the rank, the influence of technical errors on data analysis, and the preference of data analysis methods for selecting highly abundant metabolites were affected by the data pretreatment method used prior to data analysis. CONCLUSION: Different pretreatment methods emphasize different aspects of the data and each pretreatment method has its own merits and drawbacks. The choice for a pretreatment method depends on the biological question to be answered, the properties of the data set and the data analysis method selected. For the explorative analysis of the validation data set used in this study, autoscaling and range scaling performed better than the other pretreatment methods. That is, range scaling and autoscaling were able to remove the dependence of the rank of the metabolites on the average concentration and the magnitude of the fold changes and showed biologically sensible results after PCA (principal component analysis). In conclusion, selecting a proper data pretreatment method is an essential step in the analysis of metabolomics data and greatly affects the metabolites that are identified to be the most important. BioMed Central 2006-06-08 /pmc/articles/PMC1534033/ /pubmed/16762068 http://dx.doi.org/10.1186/1471-2164-7-142 Text en Copyright © 2006 van den Berg et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article van den Berg, Robert A Hoefsloot, Huub CJ Westerhuis, Johan A Smilde, Age K van der Werf, Mariët J Centering, scaling, and transformations: improving the biological information content of metabolomics data
title	Centering, scaling, and transformations: improving the biological information content of metabolomics data
title_full	Centering, scaling, and transformations: improving the biological information content of metabolomics data
title_fullStr	Centering, scaling, and transformations: improving the biological information content of metabolomics data
title_full_unstemmed	Centering, scaling, and transformations: improving the biological information content of metabolomics data
title_short	Centering, scaling, and transformations: improving the biological information content of metabolomics data
title_sort	centering, scaling, and transformations: improving the biological information content of metabolomics data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1534033/ https://www.ncbi.nlm.nih.gov/pubmed/16762068 http://dx.doi.org/10.1186/1471-2164-7-142
work_keys_str_mv	AT vandenbergroberta centeringscalingandtransformationsimprovingthebiologicalinformationcontentofmetabolomicsdata AT hoefsloothuubcj centeringscalingandtransformationsimprovingthebiologicalinformationcontentofmetabolomicsdata AT westerhuisjohana centeringscalingandtransformationsimprovingthebiologicalinformationcontentofmetabolomicsdata AT smildeagek centeringscalingandtransformationsimprovingthebiologicalinformationcontentofmetabolomicsdata AT vanderwerfmarietj centeringscalingandtransformationsimprovingthebiologicalinformationcontentofmetabolomicsdata

Centering, scaling, and transformations: improving the biological information content of metabolomics data

Ejemplares similares