Cargando…

Predicting RP-LC retention indices of structurally unknown chemicals from mass spectrometry data

Non-target analysis combined with liquid chromatography high resolution mass spectrometry is considered one of the most comprehensive strategies for the detection and identification of known and unknown chemicals in complex samples. However, many compounds remain unidentified due to data complexity...

Descripción completa

Detalles Bibliográficos
Autores principales: Boelrijk, Jim, van Herwerden, Denice, Ensing, Bernd, Forré, Patrick, Samanipour, Saer
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9960388/
https://www.ncbi.nlm.nih.gov/pubmed/36829215
http://dx.doi.org/10.1186/s13321-023-00699-8
Descripción
Sumario:Non-target analysis combined with liquid chromatography high resolution mass spectrometry is considered one of the most comprehensive strategies for the detection and identification of known and unknown chemicals in complex samples. However, many compounds remain unidentified due to data complexity and limited number structures in chemical databases. In this work, we have developed and validated a novel machine learning algorithm to predict the retention index (r[Formula: see text] ) values for structurally (un)known chemicals based on their measured fragmentation pattern. The developed model, for the first time, enabled the predication of r[Formula: see text] values without the need for the exact structure of the chemicals, with an [Formula: see text] of 0.91 and 0.77 and root mean squared error (RMSE) of 47 and 67 r[Formula: see text] units for the NORMAN ([Formula: see text] ) and amide ([Formula: see text] ) test sets, respectively. This fragment based model showed comparable accuracy in r[Formula: see text] prediction compared to conventional descriptor-based models that rely on known chemical structure, which obtained an [Formula: see text] of 0.85 with an RMSE of 67. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00699-8.