Cargando…

Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units

Open modification searching (OMS) is a powerful search strategy to identify peptides with any type of modification. OMS works by using a very wide precursor mass window to allow modified spectra to match against their unmodified variants, after which the modification types can be inferred from the c...

Descripción completa

Detalles Bibliográficos
Autores principales: Bittremieux, Wout, Laukens, Kris, Noble, William Stafford
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6886738/
https://www.ncbi.nlm.nih.gov/pubmed/31448616
http://dx.doi.org/10.1021/acs.jproteome.9b00291
_version_ 1783474914613788672
author Bittremieux, Wout
Laukens, Kris
Noble, William Stafford
author_facet Bittremieux, Wout
Laukens, Kris
Noble, William Stafford
author_sort Bittremieux, Wout
collection PubMed
description Open modification searching (OMS) is a powerful search strategy to identify peptides with any type of modification. OMS works by using a very wide precursor mass window to allow modified spectra to match against their unmodified variants, after which the modification types can be inferred from the corresponding precursor mass differences. A disadvantage of this strategy, however, is the large computational cost, because each query spectrum has to be compared against a multitude of candidate peptides. We have previously introduced the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. Here we demonstrate how this candidate selection procedure can be further optimized using graphics processing units. Additionally, we introduce a feature hashing scheme to convert high-resolution spectra to low-dimensional vectors. Based on these algorithmic advances, along with low-level code optimizations, the new version of ANN-SoLo is up to an order of magnitude faster than its initial version. This makes it possible to efficiently perform open searches on a large scale to gain a deeper understanding about the protein modification landscape. We demonstrate the computational efficiency and identification performance of ANN-SoLo based on a large data set of the draft human proteome. ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo.
format Online
Article
Text
id pubmed-6886738
institution National Center for Biotechnology Information
language English
publishDate 2019
record_format MEDLINE/PubMed
spelling pubmed-68867382020-10-04 Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units Bittremieux, Wout Laukens, Kris Noble, William Stafford J Proteome Res Article Open modification searching (OMS) is a powerful search strategy to identify peptides with any type of modification. OMS works by using a very wide precursor mass window to allow modified spectra to match against their unmodified variants, after which the modification types can be inferred from the corresponding precursor mass differences. A disadvantage of this strategy, however, is the large computational cost, because each query spectrum has to be compared against a multitude of candidate peptides. We have previously introduced the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. Here we demonstrate how this candidate selection procedure can be further optimized using graphics processing units. Additionally, we introduce a feature hashing scheme to convert high-resolution spectra to low-dimensional vectors. Based on these algorithmic advances, along with low-level code optimizations, the new version of ANN-SoLo is up to an order of magnitude faster than its initial version. This makes it possible to efficiently perform open searches on a large scale to gain a deeper understanding about the protein modification landscape. We demonstrate the computational efficiency and identification performance of ANN-SoLo based on a large data set of the draft human proteome. ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo. 2019-08-30 2019-10-04 /pmc/articles/PMC6886738/ /pubmed/31448616 http://dx.doi.org/10.1021/acs.jproteome.9b00291 Text en The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license http://creativecommons.org/licenses/by-nc-nd/4.0.
spellingShingle Article
Bittremieux, Wout
Laukens, Kris
Noble, William Stafford
Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units
title Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units
title_full Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units
title_fullStr Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units
title_full_unstemmed Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units
title_short Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units
title_sort extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6886738/
https://www.ncbi.nlm.nih.gov/pubmed/31448616
http://dx.doi.org/10.1021/acs.jproteome.9b00291
work_keys_str_mv AT bittremieuxwout extremelyfastandaccurateopenmodificationspectrallibrarysearchingofhighresolutionmassspectrausingfeaturehashingandgraphicsprocessingunits
AT laukenskris extremelyfastandaccurateopenmodificationspectrallibrarysearchingofhighresolutionmassspectrausingfeaturehashingandgraphicsprocessingunits
AT noblewilliamstafford extremelyfastandaccurateopenmodificationspectrallibrarysearchingofhighresolutionmassspectrausingfeaturehashingandgraphicsprocessingunits