Cargando…

A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps

Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the “exhaustive” extraction of information from these metabolomic datasets...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tugizimana, Fidele, Steenkamp, Paul A., Piater, Lizelle A., Dubery, Ian A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2016
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5192446/ https://www.ncbi.nlm.nih.gov/pubmed/27827887 http://dx.doi.org/10.3390/metabo6040040

_version_	1782487778302361600
author	Tugizimana, Fidele Steenkamp, Paul A. Piater, Lizelle A. Dubery, Ian A.
author_facet	Tugizimana, Fidele Steenkamp, Paul A. Piater, Lizelle A. Dubery, Ian A.
author_sort	Tugizimana, Fidele
collection	PubMed
description	Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the “exhaustive” extraction of information from these metabolomic datasets is still a non-trivial undertaking. A conversation on data mining strategies for a maximal information extraction from metabolomic data is needed. Using a liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomic dataset, this study explored the influence of collection parameters in the data pre-processing step, scaling and data transformation on the statistical models generated, and feature selection, thereafter. Data obtained in positive mode generated from a LC-MS-based untargeted metabolomic study (sorghum plants responding dynamically to infection by a fungal pathogen) were used. Raw data were pre-processed with MarkerLynx(TM) software (Waters Corporation, Manchester, UK). Here, two parameters were varied: the intensity threshold (50–100 counts) and the mass tolerance (0.005–0.01 Da). After the pre-processing, the datasets were imported into SIMCA (Umetrics, Umea, Sweden) for more data cleaning and statistical modeling. In addition, different scaling (unit variance, Pareto, etc.) and data transformation (log and power) methods were explored. The results showed that the pre-processing parameters (or algorithms) influence the output dataset with regard to the number of defined features. Furthermore, the study demonstrates that the pre-treatment of data prior to statistical modeling affects the subspace approximation outcome: e.g., the amount of variation in X-data that the model can explain and predict. The pre-processing and pre-treatment steps subsequently influence the number of statistically significant extracted/selected features (variables). Thus, as informed by the results, to maximize the value of untargeted metabolomic data, understanding of the data structures and exploration of different algorithms and methods (at different steps of the data analysis pipeline) might be the best trade-off, currently, and possibly an epistemological imperative.
format	Online Article Text
id	pubmed-5192446
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-51924462017-01-03 A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps Tugizimana, Fidele Steenkamp, Paul A. Piater, Lizelle A. Dubery, Ian A. Metabolites Article Untargeted metabolomic studies generate information-rich, high-dimensional, and complex datasets that remain challenging to handle and fully exploit. Despite the remarkable progress in the development of tools and algorithms, the “exhaustive” extraction of information from these metabolomic datasets is still a non-trivial undertaking. A conversation on data mining strategies for a maximal information extraction from metabolomic data is needed. Using a liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomic dataset, this study explored the influence of collection parameters in the data pre-processing step, scaling and data transformation on the statistical models generated, and feature selection, thereafter. Data obtained in positive mode generated from a LC-MS-based untargeted metabolomic study (sorghum plants responding dynamically to infection by a fungal pathogen) were used. Raw data were pre-processed with MarkerLynx(TM) software (Waters Corporation, Manchester, UK). Here, two parameters were varied: the intensity threshold (50–100 counts) and the mass tolerance (0.005–0.01 Da). After the pre-processing, the datasets were imported into SIMCA (Umetrics, Umea, Sweden) for more data cleaning and statistical modeling. In addition, different scaling (unit variance, Pareto, etc.) and data transformation (log and power) methods were explored. The results showed that the pre-processing parameters (or algorithms) influence the output dataset with regard to the number of defined features. Furthermore, the study demonstrates that the pre-treatment of data prior to statistical modeling affects the subspace approximation outcome: e.g., the amount of variation in X-data that the model can explain and predict. The pre-processing and pre-treatment steps subsequently influence the number of statistically significant extracted/selected features (variables). Thus, as informed by the results, to maximize the value of untargeted metabolomic data, understanding of the data structures and exploration of different algorithms and methods (at different steps of the data analysis pipeline) might be the best trade-off, currently, and possibly an epistemological imperative. MDPI 2016-11-03 /pmc/articles/PMC5192446/ /pubmed/27827887 http://dx.doi.org/10.3390/metabo6040040 Text en © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Tugizimana, Fidele Steenkamp, Paul A. Piater, Lizelle A. Dubery, Ian A. A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps
title	A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps
title_full	A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps
title_fullStr	A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps
title_full_unstemmed	A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps
title_short	A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps
title_sort	conversation on data mining strategies in lc-ms untargeted metabolomics: pre-processing and pre-treatment steps
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5192446/ https://www.ncbi.nlm.nih.gov/pubmed/27827887 http://dx.doi.org/10.3390/metabo6040040
work_keys_str_mv	AT tugizimanafidele aconversationondataminingstrategiesinlcmsuntargetedmetabolomicspreprocessingandpretreatmentsteps AT steenkamppaula aconversationondataminingstrategiesinlcmsuntargetedmetabolomicspreprocessingandpretreatmentsteps AT piaterlizellea aconversationondataminingstrategiesinlcmsuntargetedmetabolomicspreprocessingandpretreatmentsteps AT duberyiana aconversationondataminingstrategiesinlcmsuntargetedmetabolomicspreprocessingandpretreatmentsteps AT tugizimanafidele conversationondataminingstrategiesinlcmsuntargetedmetabolomicspreprocessingandpretreatmentsteps AT steenkamppaula conversationondataminingstrategiesinlcmsuntargetedmetabolomicspreprocessingandpretreatmentsteps AT piaterlizellea conversationondataminingstrategiesinlcmsuntargetedmetabolomicspreprocessingandpretreatmentsteps AT duberyiana conversationondataminingstrategiesinlcmsuntargetedmetabolomicspreprocessingandpretreatmentsteps

A Conversation on Data Mining Strategies in LC-MS Untargeted Metabolomics: Pre-Processing and Pre-Treatment Steps

Ejemplares similares