Cargando…

MS2CNN: predicting MS/MS spectrum based on protein sequence using deep convolutional neural networks

BACKGROUND: Tandem mass spectrometry allows biologists to identify and quantify protein samples in the form of digested peptide sequences. When performing peptide identification, spectral library search is more sensitive than traditional database search but is limited to peptides that have been prev...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Yang-Ming, Chen, Ching-Tai, Chang, Jia-Ming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929458/
https://www.ncbi.nlm.nih.gov/pubmed/31874640
http://dx.doi.org/10.1186/s12864-019-6297-6
Descripción
Sumario:BACKGROUND: Tandem mass spectrometry allows biologists to identify and quantify protein samples in the form of digested peptide sequences. When performing peptide identification, spectral library search is more sensitive than traditional database search but is limited to peptides that have been previously identified. An accurate tandem mass spectrum prediction tool is thus crucial in expanding the peptide space and increasing the coverage of spectral library search. RESULTS: We propose MS(2)CNN, a non-linear regression model based on deep convolutional neural networks, a deep learning algorithm. The features for our model are amino acid composition, predicted secondary structure, and physical-chemical features such as isoelectric point, aromaticity, helicity, hydrophobicity, and basicity. MS(2)CNN was trained with five-fold cross validation on a three-way data split on the large-scale human HCD MS(2) dataset of Orbitrap LC-MS/MS downloaded from the National Institute of Standards and Technology. It was then evaluated on a publicly available independent test dataset of human HeLa cell lysate from LC-MS experiments. On average, our model shows better cosine similarity and Pearson correlation coefficient (0.690 and 0.632) than MS(2)PIP (0.647 and 0.601) and is comparable with pDeep (0.692 and 0.642). Notably, for the more complex MS(2) spectra of 3+ peptides, MS(2)PIP is significantly better than both MS(2)PIP and pDeep. CONCLUSIONS: We showed that MS(2)CNN outperforms MS(2)PIP for 2+ and 3+ peptides and pDeep for 3+ peptides. This implies that MS(2)CNN, the proposed convolutional neural network model, generates highly accurate MS(2) spectra for LC-MS/MS experiments using Orbitrap machines, which can be of great help in protein and peptide identifications. The results suggest that incorporating more data for deep learning model may improve performance.