Cargando…

ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra

MOTIVATION: Metabolite identification is an important task in metabolomics to enhance the knowledge of biological systems. There have been a number of machine learning-based methods proposed for this task, which predict a chemical structure of a given spectrum through an intermediate (chemical struc...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Dai Hai, Nguyen, Canh Hao, Mamitsuka, Hiroshi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612897/
https://www.ncbi.nlm.nih.gov/pubmed/31510641
http://dx.doi.org/10.1093/bioinformatics/btz319
_version_ 1783432960689569792
author Nguyen, Dai Hai
Nguyen, Canh Hao
Mamitsuka, Hiroshi
author_facet Nguyen, Dai Hai
Nguyen, Canh Hao
Mamitsuka, Hiroshi
author_sort Nguyen, Dai Hai
collection PubMed
description MOTIVATION: Metabolite identification is an important task in metabolomics to enhance the knowledge of biological systems. There have been a number of machine learning-based methods proposed for this task, which predict a chemical structure of a given spectrum through an intermediate (chemical structure) representation called molecular fingerprints. They usually have two steps: (i) predicting fingerprints from spectra; (ii) searching chemical compounds (in database) corresponding to the predicted fingerprints. Fingerprints are feature vectors, which are usually very large to cover all possible substructures and chemical properties, and therefore heavily redundant, in the sense of having many molecular (sub)structures irrelevant to the task, causing limited predictive performance and slow prediction. RESULTS: We propose ADAPTIVE, which has two parts: learning two mappings (i) from structures to molecular vectors and (ii) from spectra to molecular vectors. The first part learns molecular vectors for metabolites from given data, to be consistent with both spectra and chemical structures of metabolites. In more detail, molecular vectors are generated by a model, being parameterized by a message passing neural network, and parameters are estimated by maximizing the correlation between molecular vectors and the corresponding spectra in terms of Hilbert-Schmidt Independence Criterion. Molecular vectors generated by this model are compact and importantly adaptive (specific) to both given data and task of metabolite identification. The second part uses input output kernel regression (IOKR), the current cutting-edge method of metabolite identification. We empirically confirmed the effectiveness of ADAPTIVE by using a benchmark data, where ADAPTIVE outperformed the original IOKR in both predictive performance and computational efficiency. AVAILABILITY AND IMPLEMENTATION: The code will be accessed through http://www.bic.kyoto-u.ac.jp/pathway/tools/ADAPTIVE after the acceptance of this article.
format Online
Article
Text
id pubmed-6612897
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-66128972019-07-12 ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra Nguyen, Dai Hai Nguyen, Canh Hao Mamitsuka, Hiroshi Bioinformatics Ismb/Eccb 2019 Conference Proceedings MOTIVATION: Metabolite identification is an important task in metabolomics to enhance the knowledge of biological systems. There have been a number of machine learning-based methods proposed for this task, which predict a chemical structure of a given spectrum through an intermediate (chemical structure) representation called molecular fingerprints. They usually have two steps: (i) predicting fingerprints from spectra; (ii) searching chemical compounds (in database) corresponding to the predicted fingerprints. Fingerprints are feature vectors, which are usually very large to cover all possible substructures and chemical properties, and therefore heavily redundant, in the sense of having many molecular (sub)structures irrelevant to the task, causing limited predictive performance and slow prediction. RESULTS: We propose ADAPTIVE, which has two parts: learning two mappings (i) from structures to molecular vectors and (ii) from spectra to molecular vectors. The first part learns molecular vectors for metabolites from given data, to be consistent with both spectra and chemical structures of metabolites. In more detail, molecular vectors are generated by a model, being parameterized by a message passing neural network, and parameters are estimated by maximizing the correlation between molecular vectors and the corresponding spectra in terms of Hilbert-Schmidt Independence Criterion. Molecular vectors generated by this model are compact and importantly adaptive (specific) to both given data and task of metabolite identification. The second part uses input output kernel regression (IOKR), the current cutting-edge method of metabolite identification. We empirically confirmed the effectiveness of ADAPTIVE by using a benchmark data, where ADAPTIVE outperformed the original IOKR in both predictive performance and computational efficiency. AVAILABILITY AND IMPLEMENTATION: The code will be accessed through http://www.bic.kyoto-u.ac.jp/pathway/tools/ADAPTIVE after the acceptance of this article. Oxford University Press 2019-07 2019-07-05 /pmc/articles/PMC6612897/ /pubmed/31510641 http://dx.doi.org/10.1093/bioinformatics/btz319 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2019 Conference Proceedings
Nguyen, Dai Hai
Nguyen, Canh Hao
Mamitsuka, Hiroshi
ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra
title ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra
title_full ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra
title_fullStr ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra
title_full_unstemmed ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra
title_short ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra
title_sort adaptive: learning data-dependent, concise molecular vectors for fast, accurate metabolite identification from tandem mass spectra
topic Ismb/Eccb 2019 Conference Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612897/
https://www.ncbi.nlm.nih.gov/pubmed/31510641
http://dx.doi.org/10.1093/bioinformatics/btz319
work_keys_str_mv AT nguyendaihai adaptivelearningdatadependentconcisemolecularvectorsforfastaccuratemetaboliteidentificationfromtandemmassspectra
AT nguyencanhhao adaptivelearningdatadependentconcisemolecularvectorsforfastaccuratemetaboliteidentificationfromtandemmassspectra
AT mamitsukahiroshi adaptivelearningdatadependentconcisemolecularvectorsforfastaccuratemetaboliteidentificationfromtandemmassspectra