Cargando…

Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features

MOTIVATION: Untargeted mass spectrometry (MS/MS) is a powerful method for detecting metabolites in biological samples. However, fast and accurate identification of the metabolites’ structures from MS/MS spectra is still a great challenge. RESULTS: We present a new analysis method, called SubFragment...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yuanyue, Kuhn, Michael, Gavin, Anne-Claude, Bork, Peer
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703789/
https://www.ncbi.nlm.nih.gov/pubmed/31605112
http://dx.doi.org/10.1093/bioinformatics/btz736
_version_ 1783616696916901888
author Li, Yuanyue
Kuhn, Michael
Gavin, Anne-Claude
Bork, Peer
author_facet Li, Yuanyue
Kuhn, Michael
Gavin, Anne-Claude
Bork, Peer
author_sort Li, Yuanyue
collection PubMed
description MOTIVATION: Untargeted mass spectrometry (MS/MS) is a powerful method for detecting metabolites in biological samples. However, fast and accurate identification of the metabolites’ structures from MS/MS spectra is still a great challenge. RESULTS: We present a new analysis method, called SubFragment-Matching (SF-Matching) that is based on the hypothesis that molecules with similar structural features will exhibit similar fragmentation patterns. We combine information on fragmentation patterns of molecules with shared substructures and then use random forest models to predict whether a given structure can yield a certain fragmentation pattern. These models can then be used to score candidate molecules for a given mass spectrum. For rapid identification, we pre-compute such scores for common biological molecular structure databases. Using benchmarking datasets, we find that our method has similar performance to CSI: FingerID and those very high accuracies can be achieved by combining our method with CSI: FingerID. Rarefaction analysis of the training dataset shows that the performance of our method will increase as more experimental data become available. AVAILABILITY AND IMPLEMENTATION: SF-Matching is available from http://www.bork.embl.de/Docu/sf_matching. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7703789
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77037892020-12-07 Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features Li, Yuanyue Kuhn, Michael Gavin, Anne-Claude Bork, Peer Bioinformatics Original Papers MOTIVATION: Untargeted mass spectrometry (MS/MS) is a powerful method for detecting metabolites in biological samples. However, fast and accurate identification of the metabolites’ structures from MS/MS spectra is still a great challenge. RESULTS: We present a new analysis method, called SubFragment-Matching (SF-Matching) that is based on the hypothesis that molecules with similar structural features will exhibit similar fragmentation patterns. We combine information on fragmentation patterns of molecules with shared substructures and then use random forest models to predict whether a given structure can yield a certain fragmentation pattern. These models can then be used to score candidate molecules for a given mass spectrum. For rapid identification, we pre-compute such scores for common biological molecular structure databases. Using benchmarking datasets, we find that our method has similar performance to CSI: FingerID and those very high accuracies can be achieved by combining our method with CSI: FingerID. Rarefaction analysis of the training dataset shows that the performance of our method will increase as more experimental data become available. AVAILABILITY AND IMPLEMENTATION: SF-Matching is available from http://www.bork.embl.de/Docu/sf_matching. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-02-15 2019-10-12 /pmc/articles/PMC7703789/ /pubmed/31605112 http://dx.doi.org/10.1093/bioinformatics/btz736 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Li, Yuanyue
Kuhn, Michael
Gavin, Anne-Claude
Bork, Peer
Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features
title Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features
title_full Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features
title_fullStr Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features
title_full_unstemmed Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features
title_short Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features
title_sort identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703789/
https://www.ncbi.nlm.nih.gov/pubmed/31605112
http://dx.doi.org/10.1093/bioinformatics/btz736
work_keys_str_mv AT liyuanyue identificationofmetabolitesfromtandemmassspectrawithamachinelearningapproachutilizingstructuralfeatures
AT kuhnmichael identificationofmetabolitesfromtandemmassspectrawithamachinelearningapproachutilizingstructuralfeatures
AT gavinanneclaude identificationofmetabolitesfromtandemmassspectrawithamachinelearningapproachutilizingstructuralfeatures
AT borkpeer identificationofmetabolitesfromtandemmassspectrawithamachinelearningapproachutilizingstructuralfeatures