Cargando…

TIDD: tool-independent and data-dependent machine learning for peptide identification

BACKGROUND: In shotgun proteomics, database search engines have been developed to assign peptides to tandem mass (MS/MS) spectra and at the same time post-processing (or rescoring) approaches over the search results have been proposed to increase the number of confident peptide identifications. The...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Honglan, Na, Seungjin, Hwang, Kyu-Baek, Paek, Eunok
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8969291/
https://www.ncbi.nlm.nih.gov/pubmed/35354356
http://dx.doi.org/10.1186/s12859-022-04640-y
_version_ 1784679214982627328
author Li, Honglan
Na, Seungjin
Hwang, Kyu-Baek
Paek, Eunok
author_facet Li, Honglan
Na, Seungjin
Hwang, Kyu-Baek
Paek, Eunok
author_sort Li, Honglan
collection PubMed
description BACKGROUND: In shotgun proteomics, database search engines have been developed to assign peptides to tandem mass (MS/MS) spectra and at the same time post-processing (or rescoring) approaches over the search results have been proposed to increase the number of confident peptide identifications. The most popular post-processing approaches such as Percolator and PeptideProphet have improved rates of peptide identifications by combining multiple scores from database search engines while applying machine learning techniques. Existing post-processing approaches, however, are limited when dealing with results from new search engines because their features for machine learning must be optimized specifically for each search engine. RESULTS: We propose a universal post-processing tool, called TIDD, which supports confident peptide identifications regardless of the search engine adopted. TIDD can work for any (including newly developed) search engines because it calculates universal features that assess peptide-spectrum match quality while it allows additional features provided by search engines (or users) as well. Even though it relies on universal features independent of search tools, TIDD showed similar or better performance than Percolator in terms of peptide identification. TIDD identified 10.23–38.95% more PSMs than target-decoy estimation for MSFragger, which is not supported by Percolator. TIDD offers an easy-to-use simple graphical user interface for user convenience. CONCLUSIONS: TIDD successfully eliminated the requirement for an optimal feature engineering per database search tool, and thus, can be applied directly to any database search results including newly developed ones. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04640-y.
format Online
Article
Text
id pubmed-8969291
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-89692912022-04-01 TIDD: tool-independent and data-dependent machine learning for peptide identification Li, Honglan Na, Seungjin Hwang, Kyu-Baek Paek, Eunok BMC Bioinformatics Research BACKGROUND: In shotgun proteomics, database search engines have been developed to assign peptides to tandem mass (MS/MS) spectra and at the same time post-processing (or rescoring) approaches over the search results have been proposed to increase the number of confident peptide identifications. The most popular post-processing approaches such as Percolator and PeptideProphet have improved rates of peptide identifications by combining multiple scores from database search engines while applying machine learning techniques. Existing post-processing approaches, however, are limited when dealing with results from new search engines because their features for machine learning must be optimized specifically for each search engine. RESULTS: We propose a universal post-processing tool, called TIDD, which supports confident peptide identifications regardless of the search engine adopted. TIDD can work for any (including newly developed) search engines because it calculates universal features that assess peptide-spectrum match quality while it allows additional features provided by search engines (or users) as well. Even though it relies on universal features independent of search tools, TIDD showed similar or better performance than Percolator in terms of peptide identification. TIDD identified 10.23–38.95% more PSMs than target-decoy estimation for MSFragger, which is not supported by Percolator. TIDD offers an easy-to-use simple graphical user interface for user convenience. CONCLUSIONS: TIDD successfully eliminated the requirement for an optimal feature engineering per database search tool, and thus, can be applied directly to any database search results including newly developed ones. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04640-y. BioMed Central 2022-03-30 /pmc/articles/PMC8969291/ /pubmed/35354356 http://dx.doi.org/10.1186/s12859-022-04640-y Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Li, Honglan
Na, Seungjin
Hwang, Kyu-Baek
Paek, Eunok
TIDD: tool-independent and data-dependent machine learning for peptide identification
title TIDD: tool-independent and data-dependent machine learning for peptide identification
title_full TIDD: tool-independent and data-dependent machine learning for peptide identification
title_fullStr TIDD: tool-independent and data-dependent machine learning for peptide identification
title_full_unstemmed TIDD: tool-independent and data-dependent machine learning for peptide identification
title_short TIDD: tool-independent and data-dependent machine learning for peptide identification
title_sort tidd: tool-independent and data-dependent machine learning for peptide identification
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8969291/
https://www.ncbi.nlm.nih.gov/pubmed/35354356
http://dx.doi.org/10.1186/s12859-022-04640-y
work_keys_str_mv AT lihonglan tiddtoolindependentanddatadependentmachinelearningforpeptideidentification
AT naseungjin tiddtoolindependentanddatadependentmachinelearningforpeptideidentification
AT hwangkyubaek tiddtoolindependentanddatadependentmachinelearningforpeptideidentification
AT paekeunok tiddtoolindependentanddatadependentmachinelearningforpeptideidentification