Cargando…

Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics

Generating and analyzing overlapping peptides through multienzymatic digestion is an efficient procedure for de novo protein using from bottom-up mass spectrometry (MS). Despite improved instrumentation and software, de novo MS data analysis remains challenging. In recent years, deep learning models...

Descripción completa

Detalles Bibliográficos
Autores principales: Gueto-Tettay, Carlos, Tang, Di, Happonen, Lotta, Heusel, Moritz, Khakzad, Hamed, Malmström, Johan, Malmström, Lars
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9891523/
https://www.ncbi.nlm.nih.gov/pubmed/36668672
http://dx.doi.org/10.1371/journal.pcbi.1010457
_version_ 1784881150577082368
author Gueto-Tettay, Carlos
Tang, Di
Happonen, Lotta
Heusel, Moritz
Khakzad, Hamed
Malmström, Johan
Malmström, Lars
author_facet Gueto-Tettay, Carlos
Tang, Di
Happonen, Lotta
Heusel, Moritz
Khakzad, Hamed
Malmström, Johan
Malmström, Lars
author_sort Gueto-Tettay, Carlos
collection PubMed
description Generating and analyzing overlapping peptides through multienzymatic digestion is an efficient procedure for de novo protein using from bottom-up mass spectrometry (MS). Despite improved instrumentation and software, de novo MS data analysis remains challenging. In recent years, deep learning models have represented a performance breakthrough. Incorporating that technology into de novo protein sequencing workflows require machine-learning models capable of handling highly diverse MS data. In this study, we analyzed the requirements for assembling such generalizable deep learning models by systemcally varying the composition and size of the training set. We assessed the generated models’ performances using two test sets composed of peptides originating from the multienzyme digestion of samples from various species. The peptide recall values on the test sets showed that the deep learning models generated from a collection of highly N- and C-termini diverse peptides generalized 76% more over the termini-restricted ones. Moreover, expanding the training set’s size by adding peptides from the multienzymatic digestion with five proteases of several species samples led to a 2–3 fold generalizability gain. Furthermore, we tested the applicability of these multienzyme deep learning (MEM) models by fully de novo sequencing the heavy and light monomeric chains of five commercial antibodies (mAbs). MEMs extracted over 10000 matching and overlapped peptides across six different proteases mAb samples, achieving a 100% sequence coverage for 8 of the ten polypeptide chains. We foretell that the MEMs’ proven improvements to de novo analysis will positively impact several applications, such as analyzing samples of high complexity, unknown nature, or the peptidomics field.
format Online
Article
Text
id pubmed-9891523
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-98915232023-02-02 Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics Gueto-Tettay, Carlos Tang, Di Happonen, Lotta Heusel, Moritz Khakzad, Hamed Malmström, Johan Malmström, Lars PLoS Comput Biol Research Article Generating and analyzing overlapping peptides through multienzymatic digestion is an efficient procedure for de novo protein using from bottom-up mass spectrometry (MS). Despite improved instrumentation and software, de novo MS data analysis remains challenging. In recent years, deep learning models have represented a performance breakthrough. Incorporating that technology into de novo protein sequencing workflows require machine-learning models capable of handling highly diverse MS data. In this study, we analyzed the requirements for assembling such generalizable deep learning models by systemcally varying the composition and size of the training set. We assessed the generated models’ performances using two test sets composed of peptides originating from the multienzyme digestion of samples from various species. The peptide recall values on the test sets showed that the deep learning models generated from a collection of highly N- and C-termini diverse peptides generalized 76% more over the termini-restricted ones. Moreover, expanding the training set’s size by adding peptides from the multienzymatic digestion with five proteases of several species samples led to a 2–3 fold generalizability gain. Furthermore, we tested the applicability of these multienzyme deep learning (MEM) models by fully de novo sequencing the heavy and light monomeric chains of five commercial antibodies (mAbs). MEMs extracted over 10000 matching and overlapped peptides across six different proteases mAb samples, achieving a 100% sequence coverage for 8 of the ten polypeptide chains. We foretell that the MEMs’ proven improvements to de novo analysis will positively impact several applications, such as analyzing samples of high complexity, unknown nature, or the peptidomics field. Public Library of Science 2023-01-20 /pmc/articles/PMC9891523/ /pubmed/36668672 http://dx.doi.org/10.1371/journal.pcbi.1010457 Text en © 2023 Gueto-Tettay et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Gueto-Tettay, Carlos
Tang, Di
Happonen, Lotta
Heusel, Moritz
Khakzad, Hamed
Malmström, Johan
Malmström, Lars
Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics
title Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics
title_full Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics
title_fullStr Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics
title_full_unstemmed Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics
title_short Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics
title_sort multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9891523/
https://www.ncbi.nlm.nih.gov/pubmed/36668672
http://dx.doi.org/10.1371/journal.pcbi.1010457
work_keys_str_mv AT guetotettaycarlos multienzymedeeplearningmodelsimprovepeptidedenovosequencingbymassspectrometryproteomics
AT tangdi multienzymedeeplearningmodelsimprovepeptidedenovosequencingbymassspectrometryproteomics
AT happonenlotta multienzymedeeplearningmodelsimprovepeptidedenovosequencingbymassspectrometryproteomics
AT heuselmoritz multienzymedeeplearningmodelsimprovepeptidedenovosequencingbymassspectrometryproteomics
AT khakzadhamed multienzymedeeplearningmodelsimprovepeptidedenovosequencingbymassspectrometryproteomics
AT malmstromjohan multienzymedeeplearningmodelsimprovepeptidedenovosequencingbymassspectrometryproteomics
AT malmstromlars multienzymedeeplearningmodelsimprovepeptidedenovosequencingbymassspectrometryproteomics