Cargando…
Fast metabolite identification with Input Output Kernel Regression
Motivation: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular struc...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908330/ https://www.ncbi.nlm.nih.gov/pubmed/27307628 http://dx.doi.org/10.1093/bioinformatics/btw246 |
_version_ | 1782437660738977792 |
---|---|
author | Brouard, Céline Shen, Huibin Dührkop, Kai d'Alché-Buc, Florence Böcker, Sebastian Rousu, Juho |
author_facet | Brouard, Céline Shen, Huibin Dührkop, Kai d'Alché-Buc, Florence Böcker, Sebastian Rousu, Juho |
author_sort | Brouard, Céline |
collection | PubMed |
description | Motivation: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. Results: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. Availability and implementation: Contact: celine.brouard@aalto.fi Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-4908330 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-49083302016-06-17 Fast metabolite identification with Input Output Kernel Regression Brouard, Céline Shen, Huibin Dührkop, Kai d'Alché-Buc, Florence Böcker, Sebastian Rousu, Juho Bioinformatics Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida Motivation: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. Results: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. Availability and implementation: Contact: celine.brouard@aalto.fi Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-06-15 2016-06-11 /pmc/articles/PMC4908330/ /pubmed/27307628 http://dx.doi.org/10.1093/bioinformatics/btw246 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida Brouard, Céline Shen, Huibin Dührkop, Kai d'Alché-Buc, Florence Böcker, Sebastian Rousu, Juho Fast metabolite identification with Input Output Kernel Regression |
title | Fast metabolite identification with Input Output Kernel Regression |
title_full | Fast metabolite identification with Input Output Kernel Regression |
title_fullStr | Fast metabolite identification with Input Output Kernel Regression |
title_full_unstemmed | Fast metabolite identification with Input Output Kernel Regression |
title_short | Fast metabolite identification with Input Output Kernel Regression |
title_sort | fast metabolite identification with input output kernel regression |
topic | Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908330/ https://www.ncbi.nlm.nih.gov/pubmed/27307628 http://dx.doi.org/10.1093/bioinformatics/btw246 |
work_keys_str_mv | AT brouardceline fastmetaboliteidentificationwithinputoutputkernelregression AT shenhuibin fastmetaboliteidentificationwithinputoutputkernelregression AT duhrkopkai fastmetaboliteidentificationwithinputoutputkernelregression AT dalchebucflorence fastmetaboliteidentificationwithinputoutputkernelregression AT bockersebastian fastmetaboliteidentificationwithinputoutputkernelregression AT rousujuho fastmetaboliteidentificationwithinputoutputkernelregression |