Cargando…

Nontargeted homologue series extraction from hyphenated high resolution mass spectrometry data

BACKGROUND: A large proportion of polar anthropogenic compounds routinely released into the environment comprises homologue series, i.e., sets of chemicals differing in a repeating chemical unit. Using analytical techniques such as liquid chromatography coupled to high-resolution mass spectrometry (...

Descripción completa

Detalles Bibliográficos
Autores principales: Loos, Martin, Singer, Heinz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5323340/
https://www.ncbi.nlm.nih.gov/pubmed/28286574
http://dx.doi.org/10.1186/s13321-017-0197-z
_version_ 1782510008110415872
author Loos, Martin
Singer, Heinz
author_facet Loos, Martin
Singer, Heinz
author_sort Loos, Martin
collection PubMed
description BACKGROUND: A large proportion of polar anthropogenic compounds routinely released into the environment comprises homologue series, i.e., sets of chemicals differing in a repeating chemical unit. Using analytical techniques such as liquid chromatography coupled to high-resolution mass spectrometry (LC-HRMS), these compounds are readily measurable as signal sets with characteristic differences in mass and typically retention time. However, and despite such distinct characteristics, no computational approach for the direct, simultaneous and untargeted detection of all such signal sets comprising both LC and HRMS information has to date been presented. RESULTS: A fast two-staged approach has been developed to extract LC-HRMS signal patterns which can be indicative of homologous analytes. In a first stage, a k-d tree representation of picked LC-HRMS peaks is used to extract all feasible 3-tuples of peaks with restrictions in, e.g., mass defect differences. A second stage then recombines these 3-tuples to larger series tuples while ensuring smooth changes in their retention time characteristics. This unsupervised approach was evaluated for ten effluent samples from Swiss sewage treatment plants (STPs), in both positive and negative electrospray-ionization. CONCLUSIONS: Beside recovering all continuous series of previously identified homologues, substantial fractions of nontargeted peaks could subsequently be assigned into very diverse peak series, although assignments were often not unique. The latter ambiguities were resolved by a self-organizing map technique and revealed both distinctive series meshing and rivaling combinatorial solutions in the presence of isobaric or gapped series peaks. When comparing STPs, several ubiquitous yet partially low-frequent series mass differences emerged and may prioritize future identification efforts. The presented algorithm is freely available as part of the R package nontarget and as a user-friendly web-interface at www.envihomolog.eawag.ch. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-017-0197-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5323340
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-53233402017-03-10 Nontargeted homologue series extraction from hyphenated high resolution mass spectrometry data Loos, Martin Singer, Heinz J Cheminform Methodology BACKGROUND: A large proportion of polar anthropogenic compounds routinely released into the environment comprises homologue series, i.e., sets of chemicals differing in a repeating chemical unit. Using analytical techniques such as liquid chromatography coupled to high-resolution mass spectrometry (LC-HRMS), these compounds are readily measurable as signal sets with characteristic differences in mass and typically retention time. However, and despite such distinct characteristics, no computational approach for the direct, simultaneous and untargeted detection of all such signal sets comprising both LC and HRMS information has to date been presented. RESULTS: A fast two-staged approach has been developed to extract LC-HRMS signal patterns which can be indicative of homologous analytes. In a first stage, a k-d tree representation of picked LC-HRMS peaks is used to extract all feasible 3-tuples of peaks with restrictions in, e.g., mass defect differences. A second stage then recombines these 3-tuples to larger series tuples while ensuring smooth changes in their retention time characteristics. This unsupervised approach was evaluated for ten effluent samples from Swiss sewage treatment plants (STPs), in both positive and negative electrospray-ionization. CONCLUSIONS: Beside recovering all continuous series of previously identified homologues, substantial fractions of nontargeted peaks could subsequently be assigned into very diverse peak series, although assignments were often not unique. The latter ambiguities were resolved by a self-organizing map technique and revealed both distinctive series meshing and rivaling combinatorial solutions in the presence of isobaric or gapped series peaks. When comparing STPs, several ubiquitous yet partially low-frequent series mass differences emerged and may prioritize future identification efforts. The presented algorithm is freely available as part of the R package nontarget and as a user-friendly web-interface at www.envihomolog.eawag.ch. [Figure: see text] ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-017-0197-z) contains supplementary material, which is available to authorized users. Springer International Publishing 2017-02-23 /pmc/articles/PMC5323340/ /pubmed/28286574 http://dx.doi.org/10.1186/s13321-017-0197-z Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Loos, Martin
Singer, Heinz
Nontargeted homologue series extraction from hyphenated high resolution mass spectrometry data
title Nontargeted homologue series extraction from hyphenated high resolution mass spectrometry data
title_full Nontargeted homologue series extraction from hyphenated high resolution mass spectrometry data
title_fullStr Nontargeted homologue series extraction from hyphenated high resolution mass spectrometry data
title_full_unstemmed Nontargeted homologue series extraction from hyphenated high resolution mass spectrometry data
title_short Nontargeted homologue series extraction from hyphenated high resolution mass spectrometry data
title_sort nontargeted homologue series extraction from hyphenated high resolution mass spectrometry data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5323340/
https://www.ncbi.nlm.nih.gov/pubmed/28286574
http://dx.doi.org/10.1186/s13321-017-0197-z
work_keys_str_mv AT loosmartin nontargetedhomologueseriesextractionfromhyphenatedhighresolutionmassspectrometrydata
AT singerheinz nontargetedhomologueseriesextractionfromhyphenatedhighresolutionmassspectrometrydata