Cargando…

Unsupervised Discovery and Comparison of Structural Families Across Multiple Samples in Untargeted Metabolomics

[Image: see text] In untargeted metabolomics approaches, the inability to structurally annotate relevant features and map them to biochemical pathways is hampering the full exploitation of many metabolomics experiments. Furthermore, variable metabolic content across samples result in sparse feature...

Descripción completa

Detalles Bibliográficos
Autores principales: van der Hooft, Justin J. J., Wandy, Joe, Young, Francesca, Padmanabhan, Sandosh, Gerasimidis, Konstantinos, Burgess, Karl E. V., Barrett, Michael P., Rogers, Simon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2017
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5524435/
https://www.ncbi.nlm.nih.gov/pubmed/28621528
http://dx.doi.org/10.1021/acs.analchem.7b01391
_version_ 1783252468713390080
author van der Hooft, Justin J. J.
Wandy, Joe
Young, Francesca
Padmanabhan, Sandosh
Gerasimidis, Konstantinos
Burgess, Karl E. V.
Barrett, Michael P.
Rogers, Simon
author_facet van der Hooft, Justin J. J.
Wandy, Joe
Young, Francesca
Padmanabhan, Sandosh
Gerasimidis, Konstantinos
Burgess, Karl E. V.
Barrett, Michael P.
Rogers, Simon
author_sort van der Hooft, Justin J. J.
collection PubMed
description [Image: see text] In untargeted metabolomics approaches, the inability to structurally annotate relevant features and map them to biochemical pathways is hampering the full exploitation of many metabolomics experiments. Furthermore, variable metabolic content across samples result in sparse feature matrices that are statistically hard to handle. Here, we introduce MS2LDA+ that tackles both above-mentioned problems. Previously, we presented MS2LDA, which extracts biochemically relevant molecular substructures (“Mass2Motifs”) from a collection of fragmentation spectra as sets of co-occurring molecular fragments and neutral losses, thereby recognizing building blocks of metabolomics. Here, we extend MS2LDA to handle multiple metabolomics experiments in one analysis, resulting in MS2LDA+. By linking Mass2Motifs across samples, we expose the variability in prevalence of structurally related metabolite families. We validate the differential prevalence of substructures between two distinct samples groups and apply it to fecal samples. Subsequently, within one sample group of urines, we rank the Mass2Motifs based on their variance to assess whether xenobiotic-derived substructures are among the most-variant Mass2Motifs. Indeed, we could ascribe 22 out of the 30 most-variant Mass2Motifs to xenobiotic-derived substructures including paracetamol/acetaminophen mercapturate and dimethylpyrogallol. In total, we structurally characterized 101 Mass2Motifs with biochemically or chemically relevant substructures. Finally, we combined the discovered metabolite families with full scan feature intensity information to obtain insight into core metabolites present in most samples and rare metabolites present in small subsets now linked through their common substructures. We conclude that by biochemical grouping of metabolites across samples MS2LDA+ aids in structural annotation of metabolites and guides prioritization of analysis by using Mass2Motif prevalence.
format Online
Article
Text
id pubmed-5524435
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-55244352017-07-25 Unsupervised Discovery and Comparison of Structural Families Across Multiple Samples in Untargeted Metabolomics van der Hooft, Justin J. J. Wandy, Joe Young, Francesca Padmanabhan, Sandosh Gerasimidis, Konstantinos Burgess, Karl E. V. Barrett, Michael P. Rogers, Simon Anal Chem [Image: see text] In untargeted metabolomics approaches, the inability to structurally annotate relevant features and map them to biochemical pathways is hampering the full exploitation of many metabolomics experiments. Furthermore, variable metabolic content across samples result in sparse feature matrices that are statistically hard to handle. Here, we introduce MS2LDA+ that tackles both above-mentioned problems. Previously, we presented MS2LDA, which extracts biochemically relevant molecular substructures (“Mass2Motifs”) from a collection of fragmentation spectra as sets of co-occurring molecular fragments and neutral losses, thereby recognizing building blocks of metabolomics. Here, we extend MS2LDA to handle multiple metabolomics experiments in one analysis, resulting in MS2LDA+. By linking Mass2Motifs across samples, we expose the variability in prevalence of structurally related metabolite families. We validate the differential prevalence of substructures between two distinct samples groups and apply it to fecal samples. Subsequently, within one sample group of urines, we rank the Mass2Motifs based on their variance to assess whether xenobiotic-derived substructures are among the most-variant Mass2Motifs. Indeed, we could ascribe 22 out of the 30 most-variant Mass2Motifs to xenobiotic-derived substructures including paracetamol/acetaminophen mercapturate and dimethylpyrogallol. In total, we structurally characterized 101 Mass2Motifs with biochemically or chemically relevant substructures. Finally, we combined the discovered metabolite families with full scan feature intensity information to obtain insight into core metabolites present in most samples and rare metabolites present in small subsets now linked through their common substructures. We conclude that by biochemical grouping of metabolites across samples MS2LDA+ aids in structural annotation of metabolites and guides prioritization of analysis by using Mass2Motif prevalence. American Chemical Society 2017-06-16 2017-07-18 /pmc/articles/PMC5524435/ /pubmed/28621528 http://dx.doi.org/10.1021/acs.analchem.7b01391 Text en Copyright © 2017 American Chemical Society This is an open access article published under a Creative Commons Attribution (CC-BY) License (http://pubs.acs.org/page/policy/authorchoice_ccby_termsofuse.html) , which permits unrestricted use, distribution and reproduction in any medium, provided the author and source are cited.
spellingShingle van der Hooft, Justin J. J.
Wandy, Joe
Young, Francesca
Padmanabhan, Sandosh
Gerasimidis, Konstantinos
Burgess, Karl E. V.
Barrett, Michael P.
Rogers, Simon
Unsupervised Discovery and Comparison of Structural Families Across Multiple Samples in Untargeted Metabolomics
title Unsupervised Discovery and Comparison of Structural Families Across Multiple Samples in Untargeted Metabolomics
title_full Unsupervised Discovery and Comparison of Structural Families Across Multiple Samples in Untargeted Metabolomics
title_fullStr Unsupervised Discovery and Comparison of Structural Families Across Multiple Samples in Untargeted Metabolomics
title_full_unstemmed Unsupervised Discovery and Comparison of Structural Families Across Multiple Samples in Untargeted Metabolomics
title_short Unsupervised Discovery and Comparison of Structural Families Across Multiple Samples in Untargeted Metabolomics
title_sort unsupervised discovery and comparison of structural families across multiple samples in untargeted metabolomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5524435/
https://www.ncbi.nlm.nih.gov/pubmed/28621528
http://dx.doi.org/10.1021/acs.analchem.7b01391
work_keys_str_mv AT vanderhooftjustinjj unsuperviseddiscoveryandcomparisonofstructuralfamiliesacrossmultiplesamplesinuntargetedmetabolomics
AT wandyjoe unsuperviseddiscoveryandcomparisonofstructuralfamiliesacrossmultiplesamplesinuntargetedmetabolomics
AT youngfrancesca unsuperviseddiscoveryandcomparisonofstructuralfamiliesacrossmultiplesamplesinuntargetedmetabolomics
AT padmanabhansandosh unsuperviseddiscoveryandcomparisonofstructuralfamiliesacrossmultiplesamplesinuntargetedmetabolomics
AT gerasimidiskonstantinos unsuperviseddiscoveryandcomparisonofstructuralfamiliesacrossmultiplesamplesinuntargetedmetabolomics
AT burgesskarlev unsuperviseddiscoveryandcomparisonofstructuralfamiliesacrossmultiplesamplesinuntargetedmetabolomics
AT barrettmichaelp unsuperviseddiscoveryandcomparisonofstructuralfamiliesacrossmultiplesamplesinuntargetedmetabolomics
AT rogerssimon unsuperviseddiscoveryandcomparisonofstructuralfamiliesacrossmultiplesamplesinuntargetedmetabolomics