Cargando…

Multiple Compounds Recognition from The Tandem Mass Spectral Data Using Convolutional Neural Network

Mixtures analysis can provide more information than individual components. It is important to detect the different compounds in the real complex samples. However, mixtures are often disturbed by impurities and noise to influence the accuracy. Purification and denoising will cost a lot of algorithm t...

Descripción completa

Detalles Bibliográficos
Autores principales: Lv, Jiali, Wei, Jian, Wang, Zhenyu, Cao, Jin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6943725/
https://www.ncbi.nlm.nih.gov/pubmed/31847456
http://dx.doi.org/10.3390/molecules24244590
_version_ 1783484939803557888
author Lv, Jiali
Wei, Jian
Wang, Zhenyu
Cao, Jin
author_facet Lv, Jiali
Wei, Jian
Wang, Zhenyu
Cao, Jin
author_sort Lv, Jiali
collection PubMed
description Mixtures analysis can provide more information than individual components. It is important to detect the different compounds in the real complex samples. However, mixtures are often disturbed by impurities and noise to influence the accuracy. Purification and denoising will cost a lot of algorithm time. In this paper, we propose a model based on convolutional neural network (CNN) which can analyze the chemical peak information in the tandem mass spectrometry (MS/MS) data. Compared with traditional analyzing methods, CNN can reduce steps in data preprocessing. This model can extract features of different compounds and classify multi-label mass spectral data. When dealing with MS data of mixtures based on the Human Metabolome Database (HMDB), the accuracy can reach at 98%. In 600 MS test data, 451 MS data were fully detected (true positive), 142 MS data were partially found (false positive), and 7 MS data were falsely predicted (true negative). In comparison, the number of true positive test data for support vector machine (SVM) with principal component analysis (PCA), deep neural network (DNN), long short-term memory (LSTM), and XGBoost respectively are 282, 293, 270, and 402; the number of false positive test data for four models are 318, 284, 198, and 168; the number of true negative test data for four models are 0, 23, 7, 132, and 30. Compared with the model proposed in other literature, the accuracy and model performance of CNN improved considerably by separating the different compounds independent MS/MS data through three-channel architecture input. By inputting MS data from different instruments, adding more offset MS data will make CNN models have stronger universality in the future.
format Online
Article
Text
id pubmed-6943725
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-69437252020-01-10 Multiple Compounds Recognition from The Tandem Mass Spectral Data Using Convolutional Neural Network Lv, Jiali Wei, Jian Wang, Zhenyu Cao, Jin Molecules Article Mixtures analysis can provide more information than individual components. It is important to detect the different compounds in the real complex samples. However, mixtures are often disturbed by impurities and noise to influence the accuracy. Purification and denoising will cost a lot of algorithm time. In this paper, we propose a model based on convolutional neural network (CNN) which can analyze the chemical peak information in the tandem mass spectrometry (MS/MS) data. Compared with traditional analyzing methods, CNN can reduce steps in data preprocessing. This model can extract features of different compounds and classify multi-label mass spectral data. When dealing with MS data of mixtures based on the Human Metabolome Database (HMDB), the accuracy can reach at 98%. In 600 MS test data, 451 MS data were fully detected (true positive), 142 MS data were partially found (false positive), and 7 MS data were falsely predicted (true negative). In comparison, the number of true positive test data for support vector machine (SVM) with principal component analysis (PCA), deep neural network (DNN), long short-term memory (LSTM), and XGBoost respectively are 282, 293, 270, and 402; the number of false positive test data for four models are 318, 284, 198, and 168; the number of true negative test data for four models are 0, 23, 7, 132, and 30. Compared with the model proposed in other literature, the accuracy and model performance of CNN improved considerably by separating the different compounds independent MS/MS data through three-channel architecture input. By inputting MS data from different instruments, adding more offset MS data will make CNN models have stronger universality in the future. MDPI 2019-12-15 /pmc/articles/PMC6943725/ /pubmed/31847456 http://dx.doi.org/10.3390/molecules24244590 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Lv, Jiali
Wei, Jian
Wang, Zhenyu
Cao, Jin
Multiple Compounds Recognition from The Tandem Mass Spectral Data Using Convolutional Neural Network
title Multiple Compounds Recognition from The Tandem Mass Spectral Data Using Convolutional Neural Network
title_full Multiple Compounds Recognition from The Tandem Mass Spectral Data Using Convolutional Neural Network
title_fullStr Multiple Compounds Recognition from The Tandem Mass Spectral Data Using Convolutional Neural Network
title_full_unstemmed Multiple Compounds Recognition from The Tandem Mass Spectral Data Using Convolutional Neural Network
title_short Multiple Compounds Recognition from The Tandem Mass Spectral Data Using Convolutional Neural Network
title_sort multiple compounds recognition from the tandem mass spectral data using convolutional neural network
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6943725/
https://www.ncbi.nlm.nih.gov/pubmed/31847456
http://dx.doi.org/10.3390/molecules24244590
work_keys_str_mv AT lvjiali multiplecompoundsrecognitionfromthetandemmassspectraldatausingconvolutionalneuralnetwork
AT weijian multiplecompoundsrecognitionfromthetandemmassspectraldatausingconvolutionalneuralnetwork
AT wangzhenyu multiplecompoundsrecognitionfromthetandemmassspectraldatausingconvolutionalneuralnetwork
AT caojin multiplecompoundsrecognitionfromthetandemmassspectraldatausingconvolutionalneuralnetwork