Cargando…

Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics

BACKGROUND: In proteomic analysis, MS/MS spectra acquired by mass spectrometer are assigned to peptides by database searching algorithms such as SEQUEST. The assignations of peptides to MS/MS spectra by SEQUEST searching algorithm are defined by several scores including Xcorr, ΔCn, Sp, Rsp, matched...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Xinning, Jiang, Xiaogang, Han, Guanghui, Ye, Mingliang, Zou, Hanfa
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2040164/
https://www.ncbi.nlm.nih.gov/pubmed/17761002
http://dx.doi.org/10.1186/1471-2105-8-323
_version_ 1782137076161970176
author Jiang, Xinning
Jiang, Xiaogang
Han, Guanghui
Ye, Mingliang
Zou, Hanfa
author_facet Jiang, Xinning
Jiang, Xiaogang
Han, Guanghui
Ye, Mingliang
Zou, Hanfa
author_sort Jiang, Xinning
collection PubMed
description BACKGROUND: In proteomic analysis, MS/MS spectra acquired by mass spectrometer are assigned to peptides by database searching algorithms such as SEQUEST. The assignations of peptides to MS/MS spectra by SEQUEST searching algorithm are defined by several scores including Xcorr, ΔCn, Sp, Rsp, matched ion count and so on. Filtering criterion using several above scores is used to isolate correct identifications from random assignments. However, the filtering criterion was not favorably optimized up to now. RESULTS: In this study, we implemented a machine learning approach known as predictive genetic algorithm (GA) for the optimization of filtering criteria to maximize the number of identified peptides at fixed false-discovery rate (FDR) for SEQUEST database searching. As the FDR was directly determined by decoy database search scheme, the GA based optimization approach did not require any pre-knowledge on the characteristics of the data set, which represented significant advantages over statistical approaches such as PeptideProphet. Compared with PeptideProphet, the GA based approach can achieve similar performance in distinguishing true from false assignment with only 1/10 of the processing time. Moreover, the GA based approach can be easily extended to process other database search results as it did not rely on any assumption on the data. CONCLUSION: Our results indicated that filtering criteria should be optimized individually for different samples. The new developed software using GA provides a convenient and fast way to create tailored optimal criteria for different proteome samples to improve proteome coverage.
format Text
id pubmed-2040164
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-20401642007-10-23 Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics Jiang, Xinning Jiang, Xiaogang Han, Guanghui Ye, Mingliang Zou, Hanfa BMC Bioinformatics Research Article BACKGROUND: In proteomic analysis, MS/MS spectra acquired by mass spectrometer are assigned to peptides by database searching algorithms such as SEQUEST. The assignations of peptides to MS/MS spectra by SEQUEST searching algorithm are defined by several scores including Xcorr, ΔCn, Sp, Rsp, matched ion count and so on. Filtering criterion using several above scores is used to isolate correct identifications from random assignments. However, the filtering criterion was not favorably optimized up to now. RESULTS: In this study, we implemented a machine learning approach known as predictive genetic algorithm (GA) for the optimization of filtering criteria to maximize the number of identified peptides at fixed false-discovery rate (FDR) for SEQUEST database searching. As the FDR was directly determined by decoy database search scheme, the GA based optimization approach did not require any pre-knowledge on the characteristics of the data set, which represented significant advantages over statistical approaches such as PeptideProphet. Compared with PeptideProphet, the GA based approach can achieve similar performance in distinguishing true from false assignment with only 1/10 of the processing time. Moreover, the GA based approach can be easily extended to process other database search results as it did not rely on any assumption on the data. CONCLUSION: Our results indicated that filtering criteria should be optimized individually for different samples. The new developed software using GA provides a convenient and fast way to create tailored optimal criteria for different proteome samples to improve proteome coverage. BioMed Central 2007-08-31 /pmc/articles/PMC2040164/ /pubmed/17761002 http://dx.doi.org/10.1186/1471-2105-8-323 Text en Copyright © 2007 Jiang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Jiang, Xinning
Jiang, Xiaogang
Han, Guanghui
Ye, Mingliang
Zou, Hanfa
Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics
title Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics
title_full Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics
title_fullStr Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics
title_full_unstemmed Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics
title_short Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics
title_sort optimization of filtering criterion for sequest database searching to improve proteome coverage in shotgun proteomics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2040164/
https://www.ncbi.nlm.nih.gov/pubmed/17761002
http://dx.doi.org/10.1186/1471-2105-8-323
work_keys_str_mv AT jiangxinning optimizationoffilteringcriterionforsequestdatabasesearchingtoimproveproteomecoverageinshotgunproteomics
AT jiangxiaogang optimizationoffilteringcriterionforsequestdatabasesearchingtoimproveproteomecoverageinshotgunproteomics
AT hanguanghui optimizationoffilteringcriterionforsequestdatabasesearchingtoimproveproteomecoverageinshotgunproteomics
AT yemingliang optimizationoffilteringcriterionforsequestdatabasesearchingtoimproveproteomecoverageinshotgunproteomics
AT zouhanfa optimizationoffilteringcriterionforsequestdatabasesearchingtoimproveproteomecoverageinshotgunproteomics