Cargando…

Multi-Task Neural Networks and Molecular Fingerprints to Enhance Compound Identification from LC-MS/MS Data

Mass spectrometry (MS) is widely used for the identification of chemical compounds by matching the experimentally acquired mass spectrum against a database of reference spectra. However, this approach suffers from a limited coverage of the existing databases causing a failure in the identification o...

Descripción completa

Detalles Bibliográficos
Autores principales: Consonni, Viviana, Gosetti, Fabio, Termopoli, Veronica, Todeschini, Roberto, Valsecchi, Cecile, Ballabio, Davide
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502453/
https://www.ncbi.nlm.nih.gov/pubmed/36144564
http://dx.doi.org/10.3390/molecules27185827
_version_ 1784795708979675136
author Consonni, Viviana
Gosetti, Fabio
Termopoli, Veronica
Todeschini, Roberto
Valsecchi, Cecile
Ballabio, Davide
author_facet Consonni, Viviana
Gosetti, Fabio
Termopoli, Veronica
Todeschini, Roberto
Valsecchi, Cecile
Ballabio, Davide
author_sort Consonni, Viviana
collection PubMed
description Mass spectrometry (MS) is widely used for the identification of chemical compounds by matching the experimentally acquired mass spectrum against a database of reference spectra. However, this approach suffers from a limited coverage of the existing databases causing a failure in the identification of a compound not present in the database. Among the computational approaches for mining metabolite structures based on MS data, one option is to predict molecular fingerprints from the mass spectra by means of chemometric strategies and then use them to screen compound libraries. This can be carried out by calibrating multi-task artificial neural networks from large datasets of mass spectra, used as inputs, and molecular fingerprints as outputs. In this study, we prepared a large LC-MS/MS dataset from an on-line open repository. These data were used to train and evaluate deep-learning-based approaches to predict molecular fingerprints and retrieve the structure of unknown compounds from their LC-MS/MS spectra. Effects of data sparseness and the impact of different strategies of data curing and dimensionality reduction on the output accuracy have been evaluated. Moreover, extensive diagnostics have been carried out to evaluate modelling advantages and drawbacks as a function of the explored chemical space.
format Online
Article
Text
id pubmed-9502453
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-95024532022-09-24 Multi-Task Neural Networks and Molecular Fingerprints to Enhance Compound Identification from LC-MS/MS Data Consonni, Viviana Gosetti, Fabio Termopoli, Veronica Todeschini, Roberto Valsecchi, Cecile Ballabio, Davide Molecules Article Mass spectrometry (MS) is widely used for the identification of chemical compounds by matching the experimentally acquired mass spectrum against a database of reference spectra. However, this approach suffers from a limited coverage of the existing databases causing a failure in the identification of a compound not present in the database. Among the computational approaches for mining metabolite structures based on MS data, one option is to predict molecular fingerprints from the mass spectra by means of chemometric strategies and then use them to screen compound libraries. This can be carried out by calibrating multi-task artificial neural networks from large datasets of mass spectra, used as inputs, and molecular fingerprints as outputs. In this study, we prepared a large LC-MS/MS dataset from an on-line open repository. These data were used to train and evaluate deep-learning-based approaches to predict molecular fingerprints and retrieve the structure of unknown compounds from their LC-MS/MS spectra. Effects of data sparseness and the impact of different strategies of data curing and dimensionality reduction on the output accuracy have been evaluated. Moreover, extensive diagnostics have been carried out to evaluate modelling advantages and drawbacks as a function of the explored chemical space. MDPI 2022-09-08 /pmc/articles/PMC9502453/ /pubmed/36144564 http://dx.doi.org/10.3390/molecules27185827 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Consonni, Viviana
Gosetti, Fabio
Termopoli, Veronica
Todeschini, Roberto
Valsecchi, Cecile
Ballabio, Davide
Multi-Task Neural Networks and Molecular Fingerprints to Enhance Compound Identification from LC-MS/MS Data
title Multi-Task Neural Networks and Molecular Fingerprints to Enhance Compound Identification from LC-MS/MS Data
title_full Multi-Task Neural Networks and Molecular Fingerprints to Enhance Compound Identification from LC-MS/MS Data
title_fullStr Multi-Task Neural Networks and Molecular Fingerprints to Enhance Compound Identification from LC-MS/MS Data
title_full_unstemmed Multi-Task Neural Networks and Molecular Fingerprints to Enhance Compound Identification from LC-MS/MS Data
title_short Multi-Task Neural Networks and Molecular Fingerprints to Enhance Compound Identification from LC-MS/MS Data
title_sort multi-task neural networks and molecular fingerprints to enhance compound identification from lc-ms/ms data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502453/
https://www.ncbi.nlm.nih.gov/pubmed/36144564
http://dx.doi.org/10.3390/molecules27185827
work_keys_str_mv AT consonniviviana multitaskneuralnetworksandmolecularfingerprintstoenhancecompoundidentificationfromlcmsmsdata
AT gosettifabio multitaskneuralnetworksandmolecularfingerprintstoenhancecompoundidentificationfromlcmsmsdata
AT termopoliveronica multitaskneuralnetworksandmolecularfingerprintstoenhancecompoundidentificationfromlcmsmsdata
AT todeschiniroberto multitaskneuralnetworksandmolecularfingerprintstoenhancecompoundidentificationfromlcmsmsdata
AT valsecchicecile multitaskneuralnetworksandmolecularfingerprintstoenhancecompoundidentificationfromlcmsmsdata
AT ballabiodavide multitaskneuralnetworksandmolecularfingerprintstoenhancecompoundidentificationfromlcmsmsdata