Cargando…

基于深度学习的保留时间预测方法的研究进展及应用

In “shotgun” proteomics strategy, the proteome is explained by analyzing tryptic digested peptides using liquid chromatography-mass spectrometry. In this strategy, the retention time of peptides in liquid chromatography separation can be predicted based on the peptide sequence. This is a useful feat...

Descripción completa

Detalles Bibliográficos
Autores principales: DU, Zhuokun, SHAO, Wei, QIN, Weijie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Editorial board of Chinese Journal of Chromatography 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9403805/
https://www.ncbi.nlm.nih.gov/pubmed/34227303
http://dx.doi.org/10.3724/SP.J.1123.2020.08015
_version_ 1784773462670180352
author DU, Zhuokun
SHAO, Wei
QIN, Weijie
author_facet DU, Zhuokun
SHAO, Wei
QIN, Weijie
author_sort DU, Zhuokun
collection PubMed
description In “shotgun” proteomics strategy, the proteome is explained by analyzing tryptic digested peptides using liquid chromatography-mass spectrometry. In this strategy, the retention time of peptides in liquid chromatography separation can be predicted based on the peptide sequence. This is a useful feature for peptide identification. Therefore, the prediction of the retention time has attracted much research attention. Traditional methods calculate the physical and chemical properties of the peptides based on their amino acid sequence to obtain the retention time under certain chromatography conditions; however, these methods cannot be directly adopted for other chromatography conditions, nor can they be used across laboratories or instrument platforms. To solve this problem, in recent years, deep learning was introduced to proteomics research for retention time prediction. Deep learning is an advanced machine-learning method that has extraordinary capability to learn complex relationships from large-scale data. By stacking multiple hidden neural networks, deep learning can ingest raw data without manually designed features. Transfer learning is an important method in deep learning. It improves the learning process a new task through the transfer of knowledge from an already-learned related task. Transfer learning allows models trained using large datasets to be utilized across conditions by fine-tuning on smaller datasets, instead of retraining the whole model. Many retention time prediction methods have been developed. In the process of training the model, the sequences of peptides are encoded to represent peptide information. Deep learning considers the relationship between the characteristics of the peptides and their corresponding retention times without the need for manual input of the physical and chemical properties of the peptides. Compared with traditional methods, deep learning methods have higher accuracy and can be easily used under different chromatography conditions by transfer learning. If there are not enough datasets to train a new model, a trained model from other datasets can be used as a replacement after calibration with small datasets obtained from these chromatography conditions. While the retention times of modified peptides can also be predicted, the predictions are inadequate for complex modifications such as glycosylation, and this is one of the main problems to be solved. The predicted retention times were used to control the quality of peptide identification. With high accuracy, the predicted retention times can be considered as actual retention times. Therefore, the difference between predicted and observed retention times can serve as an effective and unbiased quantitative metric for evaluating the quality of peptide-spectrum matches (PSMs) reported using different peptide identification methods. Combined with fragment ion intensity prediction, retention time prediction is used to generate spectral libraries for data-independent acquisition (DIA)-based mass spectrometry analysis. Generally, DIA methods identify peptides using specific spectrum libraries obtained from data-dependent acquisition (DDA) experiments. As a result, only peptides detected in the DDA experiments can be present in the libraries and detected in DIA. Furthermore, it takes a lot of time and effort to build libraries from DDA experiments, and typically, they cannot be adopted across different laboratories or instrument platforms. In contrast, the pseudo spectral libraries generated by retention times and fragment ion intensity prediction can overcome these shortcomings. The pseudo spectral libraries generate theoretical spectra of all possible peptides without the need for DDA experiments. This paper reviews the research progress of deep learning methods in the prediction of retention time and in related applications in order to provide references for retention time prediction and protein identification. At the same time, the development direction and application trend of retention time prediction methods based on deep learning are discussed.
format Online
Article
Text
id pubmed-9403805
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Editorial board of Chinese Journal of Chromatography
record_format MEDLINE/PubMed
spelling pubmed-94038052022-09-14 基于深度学习的保留时间预测方法的研究进展及应用 DU, Zhuokun SHAO, Wei QIN, Weijie Se Pu Reviews In “shotgun” proteomics strategy, the proteome is explained by analyzing tryptic digested peptides using liquid chromatography-mass spectrometry. In this strategy, the retention time of peptides in liquid chromatography separation can be predicted based on the peptide sequence. This is a useful feature for peptide identification. Therefore, the prediction of the retention time has attracted much research attention. Traditional methods calculate the physical and chemical properties of the peptides based on their amino acid sequence to obtain the retention time under certain chromatography conditions; however, these methods cannot be directly adopted for other chromatography conditions, nor can they be used across laboratories or instrument platforms. To solve this problem, in recent years, deep learning was introduced to proteomics research for retention time prediction. Deep learning is an advanced machine-learning method that has extraordinary capability to learn complex relationships from large-scale data. By stacking multiple hidden neural networks, deep learning can ingest raw data without manually designed features. Transfer learning is an important method in deep learning. It improves the learning process a new task through the transfer of knowledge from an already-learned related task. Transfer learning allows models trained using large datasets to be utilized across conditions by fine-tuning on smaller datasets, instead of retraining the whole model. Many retention time prediction methods have been developed. In the process of training the model, the sequences of peptides are encoded to represent peptide information. Deep learning considers the relationship between the characteristics of the peptides and their corresponding retention times without the need for manual input of the physical and chemical properties of the peptides. Compared with traditional methods, deep learning methods have higher accuracy and can be easily used under different chromatography conditions by transfer learning. If there are not enough datasets to train a new model, a trained model from other datasets can be used as a replacement after calibration with small datasets obtained from these chromatography conditions. While the retention times of modified peptides can also be predicted, the predictions are inadequate for complex modifications such as glycosylation, and this is one of the main problems to be solved. The predicted retention times were used to control the quality of peptide identification. With high accuracy, the predicted retention times can be considered as actual retention times. Therefore, the difference between predicted and observed retention times can serve as an effective and unbiased quantitative metric for evaluating the quality of peptide-spectrum matches (PSMs) reported using different peptide identification methods. Combined with fragment ion intensity prediction, retention time prediction is used to generate spectral libraries for data-independent acquisition (DIA)-based mass spectrometry analysis. Generally, DIA methods identify peptides using specific spectrum libraries obtained from data-dependent acquisition (DDA) experiments. As a result, only peptides detected in the DDA experiments can be present in the libraries and detected in DIA. Furthermore, it takes a lot of time and effort to build libraries from DDA experiments, and typically, they cannot be adopted across different laboratories or instrument platforms. In contrast, the pseudo spectral libraries generated by retention times and fragment ion intensity prediction can overcome these shortcomings. The pseudo spectral libraries generate theoretical spectra of all possible peptides without the need for DDA experiments. This paper reviews the research progress of deep learning methods in the prediction of retention time and in related applications in order to provide references for retention time prediction and protein identification. At the same time, the development direction and application trend of retention time prediction methods based on deep learning are discussed. Editorial board of Chinese Journal of Chromatography 2021-03-08 /pmc/articles/PMC9403805/ /pubmed/34227303 http://dx.doi.org/10.3724/SP.J.1123.2020.08015 Text en https://creativecommons.org/licenses/by/4.0/本文是开放获取文章,遵循CC BY 4.0协议 https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Reviews
DU, Zhuokun
SHAO, Wei
QIN, Weijie
基于深度学习的保留时间预测方法的研究进展及应用
title 基于深度学习的保留时间预测方法的研究进展及应用
title_full 基于深度学习的保留时间预测方法的研究进展及应用
title_fullStr 基于深度学习的保留时间预测方法的研究进展及应用
title_full_unstemmed 基于深度学习的保留时间预测方法的研究进展及应用
title_short 基于深度学习的保留时间预测方法的研究进展及应用
title_sort 基于深度学习的保留时间预测方法的研究进展及应用
topic Reviews
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9403805/
https://www.ncbi.nlm.nih.gov/pubmed/34227303
http://dx.doi.org/10.3724/SP.J.1123.2020.08015
work_keys_str_mv AT duzhuokun jīyúshēndùxuéxídebǎoliúshíjiānyùcèfāngfǎdeyánjiūjìnzhǎnjíyīngyòng
AT shaowei jīyúshēndùxuéxídebǎoliúshíjiānyùcèfāngfǎdeyánjiūjìnzhǎnjíyīngyòng
AT qinweijie jīyúshēndùxuéxídebǎoliúshíjiānyùcèfāngfǎdeyánjiūjìnzhǎnjíyīngyòng