Cargando…

Fast metabolite identification with Input Output Kernel Regression

Motivation: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular struc...

Descripción completa

Detalles Bibliográficos
Autores principales: Brouard, Céline, Shen, Huibin, Dührkop, Kai, d'Alché-Buc, Florence, Böcker, Sebastian, Rousu, Juho
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908330/
https://www.ncbi.nlm.nih.gov/pubmed/27307628
http://dx.doi.org/10.1093/bioinformatics/btw246
_version_ 1782437660738977792
author Brouard, Céline
Shen, Huibin
Dührkop, Kai
d'Alché-Buc, Florence
Böcker, Sebastian
Rousu, Juho
author_facet Brouard, Céline
Shen, Huibin
Dührkop, Kai
d'Alché-Buc, Florence
Böcker, Sebastian
Rousu, Juho
author_sort Brouard, Céline
collection PubMed
description Motivation: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. Results: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. Availability and implementation: Contact: celine.brouard@aalto.fi Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4908330
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-49083302016-06-17 Fast metabolite identification with Input Output Kernel Regression Brouard, Céline Shen, Huibin Dührkop, Kai d'Alché-Buc, Florence Böcker, Sebastian Rousu, Juho Bioinformatics Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida Motivation: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. Results: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. Availability and implementation: Contact: celine.brouard@aalto.fi Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-06-15 2016-06-11 /pmc/articles/PMC4908330/ /pubmed/27307628 http://dx.doi.org/10.1093/bioinformatics/btw246 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida
Brouard, Céline
Shen, Huibin
Dührkop, Kai
d'Alché-Buc, Florence
Böcker, Sebastian
Rousu, Juho
Fast metabolite identification with Input Output Kernel Regression
title Fast metabolite identification with Input Output Kernel Regression
title_full Fast metabolite identification with Input Output Kernel Regression
title_fullStr Fast metabolite identification with Input Output Kernel Regression
title_full_unstemmed Fast metabolite identification with Input Output Kernel Regression
title_short Fast metabolite identification with Input Output Kernel Regression
title_sort fast metabolite identification with input output kernel regression
topic Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908330/
https://www.ncbi.nlm.nih.gov/pubmed/27307628
http://dx.doi.org/10.1093/bioinformatics/btw246
work_keys_str_mv AT brouardceline fastmetaboliteidentificationwithinputoutputkernelregression
AT shenhuibin fastmetaboliteidentificationwithinputoutputkernelregression
AT duhrkopkai fastmetaboliteidentificationwithinputoutputkernelregression
AT dalchebucflorence fastmetaboliteidentificationwithinputoutputkernelregression
AT bockersebastian fastmetaboliteidentificationwithinputoutputkernelregression
AT rousujuho fastmetaboliteidentificationwithinputoutputkernelregression