Cargando…

Predicting the Diagnostic Information of Tandem Mass Spectra of Environmentally Relevant Compounds Using Machine Learning

[Image: see text] Acquisition and processing of informative tandem mass spectra (MS2) is crucial for numerous applications, including library-based (tentative) identification, feature prioritization, and prediction of chemical and toxicological characteristics. However, for environmentally relevant...

Descripción completa

Detalles Bibliográficos
Autores principales: Codrean, S., Kruit, B., Meekel, N., Vughs, D., Béen, F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603772/
https://www.ncbi.nlm.nih.gov/pubmed/37812582
http://dx.doi.org/10.1021/acs.analchem.3c03470
_version_ 1785126677328691200
author Codrean, S.
Kruit, B.
Meekel, N.
Vughs, D.
Béen, F.
author_facet Codrean, S.
Kruit, B.
Meekel, N.
Vughs, D.
Béen, F.
author_sort Codrean, S.
collection PubMed
description [Image: see text] Acquisition and processing of informative tandem mass spectra (MS2) is crucial for numerous applications, including library-based (tentative) identification, feature prioritization, and prediction of chemical and toxicological characteristics. However, for environmentally relevant compounds, approaches to automatically assess the quality of the MS2 spectra are missing. This work focused on developing a machine learning-based approach to automatically evaluate the diagnostic information of MS2 spectra (e.g., number, distribution, and intensity of diagnostic fragments) of environmentally relevant compounds analyzed with electrospray ionization. For this, approximately 1400 MS2 spectra of 204 environmental contaminants, acquired with different collision energies using liquid chromatography coupled to high-resolution mass spectrometry, were used to train a random forest classifier to distinguish between spectra providing good or poor diagnostic information. Prior to training, validation, and testing, spectra were manually labeled based on criteria such as number, intensity, range of fragments present, molecular ion intensity, and noise levels. Subsequently, feature engineering and selection were applied to retrieve relevant variables from raw MS2 spectra as inputs for the classifier. The optimal set of features based on model performances was selected and used to train a final model, which showed an accuracy of 84%, a precision of 88%, and a recall of 75%. Results show that the combination of selected features and the machine learning model used here can effectively distinguish between MS2 spectra providing good or poor diagnostic information according to the defined criteria. The developed model has the potential to improve a broad range of applications that rely on MS2 data.
format Online
Article
Text
id pubmed-10603772
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-106037722023-10-28 Predicting the Diagnostic Information of Tandem Mass Spectra of Environmentally Relevant Compounds Using Machine Learning Codrean, S. Kruit, B. Meekel, N. Vughs, D. Béen, F. Anal Chem [Image: see text] Acquisition and processing of informative tandem mass spectra (MS2) is crucial for numerous applications, including library-based (tentative) identification, feature prioritization, and prediction of chemical and toxicological characteristics. However, for environmentally relevant compounds, approaches to automatically assess the quality of the MS2 spectra are missing. This work focused on developing a machine learning-based approach to automatically evaluate the diagnostic information of MS2 spectra (e.g., number, distribution, and intensity of diagnostic fragments) of environmentally relevant compounds analyzed with electrospray ionization. For this, approximately 1400 MS2 spectra of 204 environmental contaminants, acquired with different collision energies using liquid chromatography coupled to high-resolution mass spectrometry, were used to train a random forest classifier to distinguish between spectra providing good or poor diagnostic information. Prior to training, validation, and testing, spectra were manually labeled based on criteria such as number, intensity, range of fragments present, molecular ion intensity, and noise levels. Subsequently, feature engineering and selection were applied to retrieve relevant variables from raw MS2 spectra as inputs for the classifier. The optimal set of features based on model performances was selected and used to train a final model, which showed an accuracy of 84%, a precision of 88%, and a recall of 75%. Results show that the combination of selected features and the machine learning model used here can effectively distinguish between MS2 spectra providing good or poor diagnostic information according to the defined criteria. The developed model has the potential to improve a broad range of applications that rely on MS2 data. American Chemical Society 2023-10-09 /pmc/articles/PMC10603772/ /pubmed/37812582 http://dx.doi.org/10.1021/acs.analchem.3c03470 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Codrean, S.
Kruit, B.
Meekel, N.
Vughs, D.
Béen, F.
Predicting the Diagnostic Information of Tandem Mass Spectra of Environmentally Relevant Compounds Using Machine Learning
title Predicting the Diagnostic Information of Tandem Mass Spectra of Environmentally Relevant Compounds Using Machine Learning
title_full Predicting the Diagnostic Information of Tandem Mass Spectra of Environmentally Relevant Compounds Using Machine Learning
title_fullStr Predicting the Diagnostic Information of Tandem Mass Spectra of Environmentally Relevant Compounds Using Machine Learning
title_full_unstemmed Predicting the Diagnostic Information of Tandem Mass Spectra of Environmentally Relevant Compounds Using Machine Learning
title_short Predicting the Diagnostic Information of Tandem Mass Spectra of Environmentally Relevant Compounds Using Machine Learning
title_sort predicting the diagnostic information of tandem mass spectra of environmentally relevant compounds using machine learning
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603772/
https://www.ncbi.nlm.nih.gov/pubmed/37812582
http://dx.doi.org/10.1021/acs.analchem.3c03470
work_keys_str_mv AT codreans predictingthediagnosticinformationoftandemmassspectraofenvironmentallyrelevantcompoundsusingmachinelearning
AT kruitb predictingthediagnosticinformationoftandemmassspectraofenvironmentallyrelevantcompoundsusingmachinelearning
AT meekeln predictingthediagnosticinformationoftandemmassspectraofenvironmentallyrelevantcompoundsusingmachinelearning
AT vughsd predictingthediagnosticinformationoftandemmassspectraofenvironmentallyrelevantcompoundsusingmachinelearning
AT beenf predictingthediagnosticinformationoftandemmassspectraofenvironmentallyrelevantcompoundsusingmachinelearning