Cargando…

Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides

BACKGROUND: In proteomics, the interpretation of mass spectra representing peptides carrying multiple complex modifications remains challenging, as it is difficult to strike a balance between reasonable execution time, a limited number of false positives, and a huge search space allowing any number...

Descripción completa

Detalles Bibliográficos
Autores principales: Prunier, Grégoire, Cherkaoui, Mehdi, Lysiak, Albane, Langella, Olivier, Blein-Nicolas, Mélisande, Lollier, Virginie, Benoist, Emile, Jean, Géraldine, Fertin, Guillaume, Rogniaux, Hélène, Tessier, Dominique
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10631047/
https://www.ncbi.nlm.nih.gov/pubmed/37940845
http://dx.doi.org/10.1186/s12859-023-05555-y
_version_ 1785132285551443968
author Prunier, Grégoire
Cherkaoui, Mehdi
Lysiak, Albane
Langella, Olivier
Blein-Nicolas, Mélisande
Lollier, Virginie
Benoist, Emile
Jean, Géraldine
Fertin, Guillaume
Rogniaux, Hélène
Tessier, Dominique
author_facet Prunier, Grégoire
Cherkaoui, Mehdi
Lysiak, Albane
Langella, Olivier
Blein-Nicolas, Mélisande
Lollier, Virginie
Benoist, Emile
Jean, Géraldine
Fertin, Guillaume
Rogniaux, Hélène
Tessier, Dominique
author_sort Prunier, Grégoire
collection PubMed
description BACKGROUND: In proteomics, the interpretation of mass spectra representing peptides carrying multiple complex modifications remains challenging, as it is difficult to strike a balance between reasonable execution time, a limited number of false positives, and a huge search space allowing any number of modifications without a priori. The scientific community needs new developments in this area to aid in the discovery of novel post-translational modifications that may play important roles in disease. RESULTS: To make progress on this issue, we implemented SpecGlobX (SpecGlob eXTended to eXperimental spectra), a standalone Java application that quickly determines the best spectral alignments of a (possibly very large) list of Peptide-to-Spectrum Matches (PSMs) provided by any open modification search method, or generated by the user. As input, SpecGlobX reads a file containing spectra in MGF or mzML format and a semicolon-delimited spreadsheet describing the PSMs. SpecGlobX returns the best alignment for each PSM as output, splitting the mass difference between the spectrum and the peptide into one or more shifts while considering the possibility of non-aligned masses (a phenomenon resulting from many situations including neutral losses). SpecGlobX is fast, able to align one million PSMs in about 1.5 min on a standard desktop. Firstly, we remind the foundations of the algorithm and detail how we adapted SpecGlob (the method we previously developed following the same aim, but limited to the interpretation of perfect simulated spectra) to the interpretation of imperfect experimental spectra. Then, we highlight the interest of SpecGlobX as a complementary tool downstream to three open modification search methods on a large simulated spectra dataset. Finally, we ran SpecGlobX on a proteome-wide dataset downloaded from PRIDE to demonstrate that SpecGlobX functions just as well on simulated and experimental spectra. We then carefully analyzed a limited set of interpretations. CONCLUSIONS: SpecGlobX is helpful as a decision support tool, providing keys to interpret peptides carrying complex modifications still poorly considered by current open modification search software. Better alignment of PSMs enhances confidence in the identification of spectra provided by open modification search methods and should improve the interpretation rate of spectra. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05555-y.
format Online
Article
Text
id pubmed-10631047
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106310472023-11-07 Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides Prunier, Grégoire Cherkaoui, Mehdi Lysiak, Albane Langella, Olivier Blein-Nicolas, Mélisande Lollier, Virginie Benoist, Emile Jean, Géraldine Fertin, Guillaume Rogniaux, Hélène Tessier, Dominique BMC Bioinformatics Software BACKGROUND: In proteomics, the interpretation of mass spectra representing peptides carrying multiple complex modifications remains challenging, as it is difficult to strike a balance between reasonable execution time, a limited number of false positives, and a huge search space allowing any number of modifications without a priori. The scientific community needs new developments in this area to aid in the discovery of novel post-translational modifications that may play important roles in disease. RESULTS: To make progress on this issue, we implemented SpecGlobX (SpecGlob eXTended to eXperimental spectra), a standalone Java application that quickly determines the best spectral alignments of a (possibly very large) list of Peptide-to-Spectrum Matches (PSMs) provided by any open modification search method, or generated by the user. As input, SpecGlobX reads a file containing spectra in MGF or mzML format and a semicolon-delimited spreadsheet describing the PSMs. SpecGlobX returns the best alignment for each PSM as output, splitting the mass difference between the spectrum and the peptide into one or more shifts while considering the possibility of non-aligned masses (a phenomenon resulting from many situations including neutral losses). SpecGlobX is fast, able to align one million PSMs in about 1.5 min on a standard desktop. Firstly, we remind the foundations of the algorithm and detail how we adapted SpecGlob (the method we previously developed following the same aim, but limited to the interpretation of perfect simulated spectra) to the interpretation of imperfect experimental spectra. Then, we highlight the interest of SpecGlobX as a complementary tool downstream to three open modification search methods on a large simulated spectra dataset. Finally, we ran SpecGlobX on a proteome-wide dataset downloaded from PRIDE to demonstrate that SpecGlobX functions just as well on simulated and experimental spectra. We then carefully analyzed a limited set of interpretations. CONCLUSIONS: SpecGlobX is helpful as a decision support tool, providing keys to interpret peptides carrying complex modifications still poorly considered by current open modification search software. Better alignment of PSMs enhances confidence in the identification of spectra provided by open modification search methods and should improve the interpretation rate of spectra. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05555-y. BioMed Central 2023-11-08 /pmc/articles/PMC10631047/ /pubmed/37940845 http://dx.doi.org/10.1186/s12859-023-05555-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Prunier, Grégoire
Cherkaoui, Mehdi
Lysiak, Albane
Langella, Olivier
Blein-Nicolas, Mélisande
Lollier, Virginie
Benoist, Emile
Jean, Géraldine
Fertin, Guillaume
Rogniaux, Hélène
Tessier, Dominique
Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides
title Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides
title_full Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides
title_fullStr Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides
title_full_unstemmed Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides
title_short Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides
title_sort fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10631047/
https://www.ncbi.nlm.nih.gov/pubmed/37940845
http://dx.doi.org/10.1186/s12859-023-05555-y
work_keys_str_mv AT pruniergregoire fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides
AT cherkaouimehdi fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides
AT lysiakalbane fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides
AT langellaolivier fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides
AT bleinnicolasmelisande fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides
AT lolliervirginie fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides
AT benoistemile fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides
AT jeangeraldine fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides
AT fertinguillaume fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides
AT rogniauxhelene fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides
AT tessierdominique fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides