Cargando…
Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data
[Image: see text] Nontarget analysis by liquid chromatography–high-resolution mass spectrometry (LC-HRMS) is now widely used to detect pollutants in the environment. Shifting away from targeted methods has led to detection of previously unseen chemicals, and assessing the risk posed by these newly d...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448440/ https://www.ncbi.nlm.nih.gov/pubmed/37548594 http://dx.doi.org/10.1021/acs.analchem.3c01744 |
_version_ | 1785094733822951424 |
---|---|
author | Sepman, Helen Malm, Louise Peets, Pilleriin MacLeod, Matthew Martin, Jonathan Breitholtz, Magnus Kruve, Anneli |
author_facet | Sepman, Helen Malm, Louise Peets, Pilleriin MacLeod, Matthew Martin, Jonathan Breitholtz, Magnus Kruve, Anneli |
author_sort | Sepman, Helen |
collection | PubMed |
description | [Image: see text] Nontarget analysis by liquid chromatography–high-resolution mass spectrometry (LC-HRMS) is now widely used to detect pollutants in the environment. Shifting away from targeted methods has led to detection of previously unseen chemicals, and assessing the risk posed by these newly detected chemicals is an important challenge. Assessing exposure and toxicity of chemicals detected with nontarget HRMS is highly dependent on the knowledge of the structure of the chemical. However, the majority of features detected in nontarget screening remain unidentified and therefore the risk assessment with conventional tools is hampered. Here, we developed MS2Quant, a machine learning model that enables prediction of concentration from fragmentation (MS(2)) spectra of detected, but unidentified chemicals. MS2Quant is an xgbTree algorithm-based regression model developed using ionization efficiency data for 1191 unique chemicals that spans 8 orders of magnitude. The ionization efficiency values are predicted from structural fingerprints that can be computed from the SMILES notation of the identified chemicals or from MS(2) spectra of unidentified chemicals using SIRIUS+CSI:FingerID software. The root mean square errors of the training and test sets were 0.55 (3.5×) and 0.80 (6.3×) log-units, respectively. In comparison, ionization efficiency prediction approaches that depend on assigning an unequivocal structure typically yield errors from 2× to 6×. The MS2Quant quantification model was validated on a set of 39 environmental pollutants and resulted in a mean prediction error of 7.4×, a geometric mean of 4.5×, and a median of 4.0×. For comparison, a model based on PaDEL descriptors that depends on unequivocal structural assignment was developed using the same dataset. The latter approach yielded a comparable mean prediction error of 9.5×, a geometric mean of 5.6×, and a median of 5.2× on the validation set chemicals when the top structural assignment was used as input. This confirms that MS2Quant enables to extract exposure information for unidentified chemicals which, although detected, have thus far been disregarded due to lack of accurate tools for quantification. The MS2Quant model is available as an R-package in GitHub for improving discovery and monitoring of potentially hazardous environmental pollutants with nontarget screening. |
format | Online Article Text |
id | pubmed-10448440 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-104484402023-08-25 Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data Sepman, Helen Malm, Louise Peets, Pilleriin MacLeod, Matthew Martin, Jonathan Breitholtz, Magnus Kruve, Anneli Anal Chem [Image: see text] Nontarget analysis by liquid chromatography–high-resolution mass spectrometry (LC-HRMS) is now widely used to detect pollutants in the environment. Shifting away from targeted methods has led to detection of previously unseen chemicals, and assessing the risk posed by these newly detected chemicals is an important challenge. Assessing exposure and toxicity of chemicals detected with nontarget HRMS is highly dependent on the knowledge of the structure of the chemical. However, the majority of features detected in nontarget screening remain unidentified and therefore the risk assessment with conventional tools is hampered. Here, we developed MS2Quant, a machine learning model that enables prediction of concentration from fragmentation (MS(2)) spectra of detected, but unidentified chemicals. MS2Quant is an xgbTree algorithm-based regression model developed using ionization efficiency data for 1191 unique chemicals that spans 8 orders of magnitude. The ionization efficiency values are predicted from structural fingerprints that can be computed from the SMILES notation of the identified chemicals or from MS(2) spectra of unidentified chemicals using SIRIUS+CSI:FingerID software. The root mean square errors of the training and test sets were 0.55 (3.5×) and 0.80 (6.3×) log-units, respectively. In comparison, ionization efficiency prediction approaches that depend on assigning an unequivocal structure typically yield errors from 2× to 6×. The MS2Quant quantification model was validated on a set of 39 environmental pollutants and resulted in a mean prediction error of 7.4×, a geometric mean of 4.5×, and a median of 4.0×. For comparison, a model based on PaDEL descriptors that depends on unequivocal structural assignment was developed using the same dataset. The latter approach yielded a comparable mean prediction error of 9.5×, a geometric mean of 5.6×, and a median of 5.2× on the validation set chemicals when the top structural assignment was used as input. This confirms that MS2Quant enables to extract exposure information for unidentified chemicals which, although detected, have thus far been disregarded due to lack of accurate tools for quantification. The MS2Quant model is available as an R-package in GitHub for improving discovery and monitoring of potentially hazardous environmental pollutants with nontarget screening. American Chemical Society 2023-08-07 /pmc/articles/PMC10448440/ /pubmed/37548594 http://dx.doi.org/10.1021/acs.analchem.3c01744 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Sepman, Helen Malm, Louise Peets, Pilleriin MacLeod, Matthew Martin, Jonathan Breitholtz, Magnus Kruve, Anneli Bypassing the Identification: MS2Quant for Concentration Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data |
title | Bypassing the
Identification: MS2Quant for Concentration
Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data |
title_full | Bypassing the
Identification: MS2Quant for Concentration
Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data |
title_fullStr | Bypassing the
Identification: MS2Quant for Concentration
Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data |
title_full_unstemmed | Bypassing the
Identification: MS2Quant for Concentration
Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data |
title_short | Bypassing the
Identification: MS2Quant for Concentration
Estimations of Chemicals Detected with Nontarget LC-HRMS from MS(2) Data |
title_sort | bypassing the
identification: ms2quant for concentration
estimations of chemicals detected with nontarget lc-hrms from ms(2) data |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448440/ https://www.ncbi.nlm.nih.gov/pubmed/37548594 http://dx.doi.org/10.1021/acs.analchem.3c01744 |
work_keys_str_mv | AT sepmanhelen bypassingtheidentificationms2quantforconcentrationestimationsofchemicalsdetectedwithnontargetlchrmsfromms2data AT malmlouise bypassingtheidentificationms2quantforconcentrationestimationsofchemicalsdetectedwithnontargetlchrmsfromms2data AT peetspilleriin bypassingtheidentificationms2quantforconcentrationestimationsofchemicalsdetectedwithnontargetlchrmsfromms2data AT macleodmatthew bypassingtheidentificationms2quantforconcentrationestimationsofchemicalsdetectedwithnontargetlchrmsfromms2data AT martinjonathan bypassingtheidentificationms2quantforconcentrationestimationsofchemicalsdetectedwithnontargetlchrmsfromms2data AT breitholtzmagnus bypassingtheidentificationms2quantforconcentrationestimationsofchemicalsdetectedwithnontargetlchrmsfromms2data AT kruveanneli bypassingtheidentificationms2quantforconcentrationestimationsofchemicalsdetectedwithnontargetlchrmsfromms2data |