Cargando…
Predicting the Diagnostic Information of Tandem Mass Spectra of Environmentally Relevant Compounds Using Machine Learning
[Image: see text] Acquisition and processing of informative tandem mass spectra (MS2) is crucial for numerous applications, including library-based (tentative) identification, feature prioritization, and prediction of chemical and toxicological characteristics. However, for environmentally relevant...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603772/ https://www.ncbi.nlm.nih.gov/pubmed/37812582 http://dx.doi.org/10.1021/acs.analchem.3c03470 |
_version_ | 1785126677328691200 |
---|---|
author | Codrean, S. Kruit, B. Meekel, N. Vughs, D. Béen, F. |
author_facet | Codrean, S. Kruit, B. Meekel, N. Vughs, D. Béen, F. |
author_sort | Codrean, S. |
collection | PubMed |
description | [Image: see text] Acquisition and processing of informative tandem mass spectra (MS2) is crucial for numerous applications, including library-based (tentative) identification, feature prioritization, and prediction of chemical and toxicological characteristics. However, for environmentally relevant compounds, approaches to automatically assess the quality of the MS2 spectra are missing. This work focused on developing a machine learning-based approach to automatically evaluate the diagnostic information of MS2 spectra (e.g., number, distribution, and intensity of diagnostic fragments) of environmentally relevant compounds analyzed with electrospray ionization. For this, approximately 1400 MS2 spectra of 204 environmental contaminants, acquired with different collision energies using liquid chromatography coupled to high-resolution mass spectrometry, were used to train a random forest classifier to distinguish between spectra providing good or poor diagnostic information. Prior to training, validation, and testing, spectra were manually labeled based on criteria such as number, intensity, range of fragments present, molecular ion intensity, and noise levels. Subsequently, feature engineering and selection were applied to retrieve relevant variables from raw MS2 spectra as inputs for the classifier. The optimal set of features based on model performances was selected and used to train a final model, which showed an accuracy of 84%, a precision of 88%, and a recall of 75%. Results show that the combination of selected features and the machine learning model used here can effectively distinguish between MS2 spectra providing good or poor diagnostic information according to the defined criteria. The developed model has the potential to improve a broad range of applications that rely on MS2 data. |
format | Online Article Text |
id | pubmed-10603772 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-106037722023-10-28 Predicting the Diagnostic Information of Tandem Mass Spectra of Environmentally Relevant Compounds Using Machine Learning Codrean, S. Kruit, B. Meekel, N. Vughs, D. Béen, F. Anal Chem [Image: see text] Acquisition and processing of informative tandem mass spectra (MS2) is crucial for numerous applications, including library-based (tentative) identification, feature prioritization, and prediction of chemical and toxicological characteristics. However, for environmentally relevant compounds, approaches to automatically assess the quality of the MS2 spectra are missing. This work focused on developing a machine learning-based approach to automatically evaluate the diagnostic information of MS2 spectra (e.g., number, distribution, and intensity of diagnostic fragments) of environmentally relevant compounds analyzed with electrospray ionization. For this, approximately 1400 MS2 spectra of 204 environmental contaminants, acquired with different collision energies using liquid chromatography coupled to high-resolution mass spectrometry, were used to train a random forest classifier to distinguish between spectra providing good or poor diagnostic information. Prior to training, validation, and testing, spectra were manually labeled based on criteria such as number, intensity, range of fragments present, molecular ion intensity, and noise levels. Subsequently, feature engineering and selection were applied to retrieve relevant variables from raw MS2 spectra as inputs for the classifier. The optimal set of features based on model performances was selected and used to train a final model, which showed an accuracy of 84%, a precision of 88%, and a recall of 75%. Results show that the combination of selected features and the machine learning model used here can effectively distinguish between MS2 spectra providing good or poor diagnostic information according to the defined criteria. The developed model has the potential to improve a broad range of applications that rely on MS2 data. American Chemical Society 2023-10-09 /pmc/articles/PMC10603772/ /pubmed/37812582 http://dx.doi.org/10.1021/acs.analchem.3c03470 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Codrean, S. Kruit, B. Meekel, N. Vughs, D. Béen, F. Predicting the Diagnostic Information of Tandem Mass Spectra of Environmentally Relevant Compounds Using Machine Learning |
title | Predicting
the Diagnostic Information of Tandem Mass
Spectra of Environmentally Relevant Compounds Using Machine Learning |
title_full | Predicting
the Diagnostic Information of Tandem Mass
Spectra of Environmentally Relevant Compounds Using Machine Learning |
title_fullStr | Predicting
the Diagnostic Information of Tandem Mass
Spectra of Environmentally Relevant Compounds Using Machine Learning |
title_full_unstemmed | Predicting
the Diagnostic Information of Tandem Mass
Spectra of Environmentally Relevant Compounds Using Machine Learning |
title_short | Predicting
the Diagnostic Information of Tandem Mass
Spectra of Environmentally Relevant Compounds Using Machine Learning |
title_sort | predicting
the diagnostic information of tandem mass
spectra of environmentally relevant compounds using machine learning |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10603772/ https://www.ncbi.nlm.nih.gov/pubmed/37812582 http://dx.doi.org/10.1021/acs.analchem.3c03470 |
work_keys_str_mv | AT codreans predictingthediagnosticinformationoftandemmassspectraofenvironmentallyrelevantcompoundsusingmachinelearning AT kruitb predictingthediagnosticinformationoftandemmassspectraofenvironmentallyrelevantcompoundsusingmachinelearning AT meekeln predictingthediagnosticinformationoftandemmassspectraofenvironmentallyrelevantcompoundsusingmachinelearning AT vughsd predictingthediagnosticinformationoftandemmassspectraofenvironmentallyrelevantcompoundsusingmachinelearning AT beenf predictingthediagnosticinformationoftandemmassspectraofenvironmentallyrelevantcompoundsusingmachinelearning |