Cargando…

Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data

[Image: see text] Nontarget analysis by liquid chromatography–high-resolution mass spectrometry (LC-HRMS) is now widely used to detect pollutants in the environment. Shifting away from targeted methods has led to detection of previously unseen chemicals, and assessing the risk posed by these newly d...

Descripción completa

Detalles Bibliográficos
Autores principales: Sepman, Helen, Malm, Louise, Peets, Pilleriin, MacLeod, Matthew, Martin, Jonathan, Breitholtz, Magnus, Kruve, Anneli
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448440/
https://www.ncbi.nlm.nih.gov/pubmed/37548594
http://dx.doi.org/10.1021/acs.analchem.3c01744
_version_ 1785094733822951424
author Sepman, Helen
Malm, Louise
Peets, Pilleriin
MacLeod, Matthew
Martin, Jonathan
Breitholtz, Magnus
Kruve, Anneli
author_facet Sepman, Helen
Malm, Louise
Peets, Pilleriin
MacLeod, Matthew
Martin, Jonathan
Breitholtz, Magnus
Kruve, Anneli
author_sort Sepman, Helen
collection PubMed
description [Image: see text] Nontarget analysis by liquid chromatography–high-resolution mass spectrometry (LC-HRMS) is now widely used to detect pollutants in the environment. Shifting away from targeted methods has led to detection of previously unseen chemicals, and assessing the risk posed by these newly detected chemicals is an important challenge. Assessing exposure and toxicity of chemicals detected with nontarget HRMS is highly dependent on the knowledge of the structure of the chemical. However, the majority of features detected in nontarget screening remain unidentified and therefore the risk assessment with conventional tools is hampered. Here, we developed MS2Quant, a machine learning model that enables prediction of concentration from fragmentation (MS(2)) spectra of detected, but unidentified chemicals. MS2Quant is an xgbTree algorithm-based regression model developed using ionization efficiency data for 1191 unique chemicals that spans 8 orders of magnitude. The ionization efficiency values are predicted from structural fingerprints that can be computed from the SMILES notation of the identified chemicals or from MS(2) spectra of unidentified chemicals using SIRIUS+CSI:FingerID software. The root mean square errors of the training and test sets were 0.55 (3.5×) and 0.80 (6.3×) log-units, respectively. In comparison, ionization efficiency prediction approaches that depend on assigning an unequivocal structure typically yield errors from 2× to 6×. The MS2Quant quantification model was validated on a set of 39 environmental pollutants and resulted in a mean prediction error of 7.4×, a geometric mean of 4.5×, and a median of 4.0×. For comparison, a model based on PaDEL descriptors that depends on unequivocal structural assignment was developed using the same dataset. The latter approach yielded a comparable mean prediction error of 9.5×, a geometric mean of 5.6×, and a median of 5.2× on the validation set chemicals when the top structural assignment was used as input. This confirms that MS2Quant enables to extract exposure information for unidentified chemicals which, although detected, have thus far been disregarded due to lack of accurate tools for quantification. The MS2Quant model is available as an R-package in GitHub for improving discovery and monitoring of potentially hazardous environmental pollutants with nontarget screening.
format Online
Article
Text
id pubmed-10448440
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-104484402023-08-25 Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data Sepman, Helen Malm, Louise Peets, Pilleriin MacLeod, Matthew Martin, Jonathan Breitholtz, Magnus Kruve, Anneli Anal Chem [Image: see text] Nontarget analysis by liquid chromatography–high-resolution mass spectrometry (LC-HRMS) is now widely used to detect pollutants in the environment. Shifting away from targeted methods has led to detection of previously unseen chemicals, and assessing the risk posed by these newly detected chemicals is an important challenge. Assessing exposure and toxicity of chemicals detected with nontarget HRMS is highly dependent on the knowledge of the structure of the chemical. However, the majority of features detected in nontarget screening remain unidentified and therefore the risk assessment with conventional tools is hampered. Here, we developed MS2Quant, a machine learning model that enables prediction of concentration from fragmentation (MS(2)) spectra of detected, but unidentified chemicals. MS2Quant is an xgbTree algorithm-based regression model developed using ionization efficiency data for 1191 unique chemicals that spans 8 orders of magnitude. The ionization efficiency values are predicted from structural fingerprints that can be computed from the SMILES notation of the identified chemicals or from MS(2) spectra of unidentified chemicals using SIRIUS+CSI:FingerID software. The root mean square errors of the training and test sets were 0.55 (3.5×) and 0.80 (6.3×) log-units, respectively. In comparison, ionization efficiency prediction approaches that depend on assigning an unequivocal structure typically yield errors from 2× to 6×. The MS2Quant quantification model was validated on a set of 39 environmental pollutants and resulted in a mean prediction error of 7.4×, a geometric mean of 4.5×, and a median of 4.0×. For comparison, a model based on PaDEL descriptors that depends on unequivocal structural assignment was developed using the same dataset. The latter approach yielded a comparable mean prediction error of 9.5×, a geometric mean of 5.6×, and a median of 5.2× on the validation set chemicals when the top structural assignment was used as input. This confirms that MS2Quant enables to extract exposure information for unidentified chemicals which, although detected, have thus far been disregarded due to lack of accurate tools for quantification. The MS2Quant model is available as an R-package in GitHub for improving discovery and monitoring of potentially hazardous environmental pollutants with nontarget screening. American Chemical Society 2023-08-07 /pmc/articles/PMC10448440/ /pubmed/37548594 http://dx.doi.org/10.1021/acs.analchem.3c01744 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Sepman, Helen
Malm, Louise
Peets, Pilleriin
MacLeod, Matthew
Martin, Jonathan
Breitholtz, Magnus
Kruve, Anneli
Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data
title Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data
title_full Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data
title_fullStr Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data
title_full_unstemmed Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data
title_short Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data
title_sort bypassing the identification: ms2quant for concentration estimations of chemicals detected with nontarget lc-hrms from ms(2) data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448440/
https://www.ncbi.nlm.nih.gov/pubmed/37548594
http://dx.doi.org/10.1021/acs.analchem.3c01744
work_keys_str_mv AT sepmanhelen bypassingtheidentificationms2quantforconcentrationestimationsofchemicalsdetectedwithnontargetlchrmsfromms2data
AT malmlouise bypassingtheidentificationms2quantforconcentrationestimationsofchemicalsdetectedwithnontargetlchrmsfromms2data
AT peetspilleriin bypassingtheidentificationms2quantforconcentrationestimationsofchemicalsdetectedwithnontargetlchrmsfromms2data
AT macleodmatthew bypassingtheidentificationms2quantforconcentrationestimationsofchemicalsdetectedwithnontargetlchrmsfromms2data
AT martinjonathan bypassingtheidentificationms2quantforconcentrationestimationsofchemicalsdetectedwithnontargetlchrmsfromms2data
AT breitholtzmagnus bypassingtheidentificationms2quantforconcentrationestimationsofchemicalsdetectedwithnontargetlchrmsfromms2data
AT kruveanneli bypassingtheidentificationms2quantforconcentrationestimationsofchemicalsdetectedwithnontargetlchrmsfromms2data