Cargando…
Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides
BACKGROUND: In proteomics, the interpretation of mass spectra representing peptides carrying multiple complex modifications remains challenging, as it is difficult to strike a balance between reasonable execution time, a limited number of false positives, and a huge search space allowing any number...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10631047/ https://www.ncbi.nlm.nih.gov/pubmed/37940845 http://dx.doi.org/10.1186/s12859-023-05555-y |
_version_ | 1785132285551443968 |
---|---|
author | Prunier, Grégoire Cherkaoui, Mehdi Lysiak, Albane Langella, Olivier Blein-Nicolas, Mélisande Lollier, Virginie Benoist, Emile Jean, Géraldine Fertin, Guillaume Rogniaux, Hélène Tessier, Dominique |
author_facet | Prunier, Grégoire Cherkaoui, Mehdi Lysiak, Albane Langella, Olivier Blein-Nicolas, Mélisande Lollier, Virginie Benoist, Emile Jean, Géraldine Fertin, Guillaume Rogniaux, Hélène Tessier, Dominique |
author_sort | Prunier, Grégoire |
collection | PubMed |
description | BACKGROUND: In proteomics, the interpretation of mass spectra representing peptides carrying multiple complex modifications remains challenging, as it is difficult to strike a balance between reasonable execution time, a limited number of false positives, and a huge search space allowing any number of modifications without a priori. The scientific community needs new developments in this area to aid in the discovery of novel post-translational modifications that may play important roles in disease. RESULTS: To make progress on this issue, we implemented SpecGlobX (SpecGlob eXTended to eXperimental spectra), a standalone Java application that quickly determines the best spectral alignments of a (possibly very large) list of Peptide-to-Spectrum Matches (PSMs) provided by any open modification search method, or generated by the user. As input, SpecGlobX reads a file containing spectra in MGF or mzML format and a semicolon-delimited spreadsheet describing the PSMs. SpecGlobX returns the best alignment for each PSM as output, splitting the mass difference between the spectrum and the peptide into one or more shifts while considering the possibility of non-aligned masses (a phenomenon resulting from many situations including neutral losses). SpecGlobX is fast, able to align one million PSMs in about 1.5 min on a standard desktop. Firstly, we remind the foundations of the algorithm and detail how we adapted SpecGlob (the method we previously developed following the same aim, but limited to the interpretation of perfect simulated spectra) to the interpretation of imperfect experimental spectra. Then, we highlight the interest of SpecGlobX as a complementary tool downstream to three open modification search methods on a large simulated spectra dataset. Finally, we ran SpecGlobX on a proteome-wide dataset downloaded from PRIDE to demonstrate that SpecGlobX functions just as well on simulated and experimental spectra. We then carefully analyzed a limited set of interpretations. CONCLUSIONS: SpecGlobX is helpful as a decision support tool, providing keys to interpret peptides carrying complex modifications still poorly considered by current open modification search software. Better alignment of PSMs enhances confidence in the identification of spectra provided by open modification search methods and should improve the interpretation rate of spectra. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05555-y. |
format | Online Article Text |
id | pubmed-10631047 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-106310472023-11-07 Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides Prunier, Grégoire Cherkaoui, Mehdi Lysiak, Albane Langella, Olivier Blein-Nicolas, Mélisande Lollier, Virginie Benoist, Emile Jean, Géraldine Fertin, Guillaume Rogniaux, Hélène Tessier, Dominique BMC Bioinformatics Software BACKGROUND: In proteomics, the interpretation of mass spectra representing peptides carrying multiple complex modifications remains challenging, as it is difficult to strike a balance between reasonable execution time, a limited number of false positives, and a huge search space allowing any number of modifications without a priori. The scientific community needs new developments in this area to aid in the discovery of novel post-translational modifications that may play important roles in disease. RESULTS: To make progress on this issue, we implemented SpecGlobX (SpecGlob eXTended to eXperimental spectra), a standalone Java application that quickly determines the best spectral alignments of a (possibly very large) list of Peptide-to-Spectrum Matches (PSMs) provided by any open modification search method, or generated by the user. As input, SpecGlobX reads a file containing spectra in MGF or mzML format and a semicolon-delimited spreadsheet describing the PSMs. SpecGlobX returns the best alignment for each PSM as output, splitting the mass difference between the spectrum and the peptide into one or more shifts while considering the possibility of non-aligned masses (a phenomenon resulting from many situations including neutral losses). SpecGlobX is fast, able to align one million PSMs in about 1.5 min on a standard desktop. Firstly, we remind the foundations of the algorithm and detail how we adapted SpecGlob (the method we previously developed following the same aim, but limited to the interpretation of perfect simulated spectra) to the interpretation of imperfect experimental spectra. Then, we highlight the interest of SpecGlobX as a complementary tool downstream to three open modification search methods on a large simulated spectra dataset. Finally, we ran SpecGlobX on a proteome-wide dataset downloaded from PRIDE to demonstrate that SpecGlobX functions just as well on simulated and experimental spectra. We then carefully analyzed a limited set of interpretations. CONCLUSIONS: SpecGlobX is helpful as a decision support tool, providing keys to interpret peptides carrying complex modifications still poorly considered by current open modification search software. Better alignment of PSMs enhances confidence in the identification of spectra provided by open modification search methods and should improve the interpretation rate of spectra. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05555-y. BioMed Central 2023-11-08 /pmc/articles/PMC10631047/ /pubmed/37940845 http://dx.doi.org/10.1186/s12859-023-05555-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Prunier, Grégoire Cherkaoui, Mehdi Lysiak, Albane Langella, Olivier Blein-Nicolas, Mélisande Lollier, Virginie Benoist, Emile Jean, Géraldine Fertin, Guillaume Rogniaux, Hélène Tessier, Dominique Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides |
title | Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides |
title_full | Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides |
title_fullStr | Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides |
title_full_unstemmed | Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides |
title_short | Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides |
title_sort | fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10631047/ https://www.ncbi.nlm.nih.gov/pubmed/37940845 http://dx.doi.org/10.1186/s12859-023-05555-y |
work_keys_str_mv | AT pruniergregoire fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides AT cherkaouimehdi fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides AT lysiakalbane fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides AT langellaolivier fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides AT bleinnicolasmelisande fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides AT lolliervirginie fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides AT benoistemile fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides AT jeangeraldine fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides AT fertinguillaume fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides AT rogniauxhelene fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides AT tessierdominique fastalignmentofmassspectrainlargeproteomicsdatasetscapturingdissimilaritiesarisingfrommultiplecomplexmodificationsofpeptides |