Cargando…

Decision tree supported substructure prediction of metabolites from GC-MS profiles

Gas chromatography coupled to mass spectrometry (GC-MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers. However, currently the majority of mass spectral tags (MSTs) remains unidentified due to the lack of authentica...

Descripción completa

Detalles Bibliográficos
Autores principales: Hummel, Jan, Strehmel, Nadine, Selbig, Joachim, Walther, Dirk, Kopka, Joachim
Formato: Texto
Lenguaje:English
Publicado: Springer US 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2874469/
https://www.ncbi.nlm.nih.gov/pubmed/20526350
http://dx.doi.org/10.1007/s11306-010-0198-7
_version_ 1782181481158803456
author Hummel, Jan
Strehmel, Nadine
Selbig, Joachim
Walther, Dirk
Kopka, Joachim
author_facet Hummel, Jan
Strehmel, Nadine
Selbig, Joachim
Walther, Dirk
Kopka, Joachim
author_sort Hummel, Jan
collection PubMed
description Gas chromatography coupled to mass spectrometry (GC-MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers. However, currently the majority of mass spectral tags (MSTs) remains unidentified due to the lack of authenticated pure reference substances required for compound identification by GC-MS. Here, we accessed the information on reference compounds stored in the Golm Metabolome Database (GMD) to apply supervised machine learning approaches to the classification and identification of unidentified MSTs without relying on library searches. Non-annotated MSTs with mass spectral and retention index (RI) information together with data of already identified metabolites and reference substances have been archived in the GMD. Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes. Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds. The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services. The GMD mass spectral library with the integrated DTs is freely accessible for non-commercial use at http://gmd.mpimp-golm.mpg.de/. All matching and structure search functionalities are available as SOAP-based web services. A XML + HTTP interface, which follows Representational State Transfer (REST) principles, facilitates read-only access to data base entities.
format Text
id pubmed-2874469
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-28744692010-06-04 Decision tree supported substructure prediction of metabolites from GC-MS profiles Hummel, Jan Strehmel, Nadine Selbig, Joachim Walther, Dirk Kopka, Joachim Metabolomics Original Paper Gas chromatography coupled to mass spectrometry (GC-MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers. However, currently the majority of mass spectral tags (MSTs) remains unidentified due to the lack of authenticated pure reference substances required for compound identification by GC-MS. Here, we accessed the information on reference compounds stored in the Golm Metabolome Database (GMD) to apply supervised machine learning approaches to the classification and identification of unidentified MSTs without relying on library searches. Non-annotated MSTs with mass spectral and retention index (RI) information together with data of already identified metabolites and reference substances have been archived in the GMD. Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes. Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds. The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services. The GMD mass spectral library with the integrated DTs is freely accessible for non-commercial use at http://gmd.mpimp-golm.mpg.de/. All matching and structure search functionalities are available as SOAP-based web services. A XML + HTTP interface, which follows Representational State Transfer (REST) principles, facilitates read-only access to data base entities. Springer US 2010-02-16 2010 /pmc/articles/PMC2874469/ /pubmed/20526350 http://dx.doi.org/10.1007/s11306-010-0198-7 Text en © The Author(s) 2010 https://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
spellingShingle Original Paper
Hummel, Jan
Strehmel, Nadine
Selbig, Joachim
Walther, Dirk
Kopka, Joachim
Decision tree supported substructure prediction of metabolites from GC-MS profiles
title Decision tree supported substructure prediction of metabolites from GC-MS profiles
title_full Decision tree supported substructure prediction of metabolites from GC-MS profiles
title_fullStr Decision tree supported substructure prediction of metabolites from GC-MS profiles
title_full_unstemmed Decision tree supported substructure prediction of metabolites from GC-MS profiles
title_short Decision tree supported substructure prediction of metabolites from GC-MS profiles
title_sort decision tree supported substructure prediction of metabolites from gc-ms profiles
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2874469/
https://www.ncbi.nlm.nih.gov/pubmed/20526350
http://dx.doi.org/10.1007/s11306-010-0198-7
work_keys_str_mv AT hummeljan decisiontreesupportedsubstructurepredictionofmetabolitesfromgcmsprofiles
AT strehmelnadine decisiontreesupportedsubstructurepredictionofmetabolitesfromgcmsprofiles
AT selbigjoachim decisiontreesupportedsubstructurepredictionofmetabolitesfromgcmsprofiles
AT waltherdirk decisiontreesupportedsubstructurepredictionofmetabolitesfromgcmsprofiles
AT kopkajoachim decisiontreesupportedsubstructurepredictionofmetabolitesfromgcmsprofiles