Cargando…

An automated proteomic data analysis workflow for mass spectrometry

BACKGROUND: Mass spectrometry-based protein identification methods are fundamental to proteomics. Biological experiments are usually performed in replicates and proteomic analyses generate huge datasets which need to be integrated and quantitatively analyzed. The Sequest™ search algorithm is a commo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pendarvis, Ken, Kumar, Ranjit, Burgess, Shane C, Nanduri, Bindu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3226188/ https://www.ncbi.nlm.nih.gov/pubmed/19811682 http://dx.doi.org/10.1186/1471-2105-10-S11-S17

_version_	1782217578481975296
author	Pendarvis, Ken Kumar, Ranjit Burgess, Shane C Nanduri, Bindu
author_facet	Pendarvis, Ken Kumar, Ranjit Burgess, Shane C Nanduri, Bindu
author_sort	Pendarvis, Ken
collection	PubMed
description	BACKGROUND: Mass spectrometry-based protein identification methods are fundamental to proteomics. Biological experiments are usually performed in replicates and proteomic analyses generate huge datasets which need to be integrated and quantitatively analyzed. The Sequest™ search algorithm is a commonly used algorithm for identifying peptides and proteins from two dimensional liquid chromatography electrospray ionization tandem mass spectrometry (2-D LC ESI MS(2)) data. A number of proteomic pipelines that facilitate high throughput 'post data acquisition analysis' are described in the literature. However, these pipelines need to be updated to accommodate the rapidly evolving data analysis methods. Here, we describe a proteomic data analysis pipeline that specifically addresses two main issues pertinent to protein identification and differential expression analysis: 1) estimation of the probability of peptide and protein identifications and 2) non-parametric statistics for protein differential expression analysis. Our proteomic analysis workflow analyzes replicate datasets from a single experimental paradigm to generate a list of identified proteins with their probabilities and significant changes in protein expression using parametric and non-parametric statistics. RESULTS: The input for our workflow is Bioworks™ 3.2 Sequest (or a later version, including cluster) output in XML format. We use a decoy database approach to assign probability to peptide identifications. The user has the option to select "quality thresholds" on peptide identifications based on the P value. We also estimate probability for protein identification. Proteins identified with peptides at a user-specified threshold value from biological experiments are grouped as either control or treatment for further analysis in ProtQuant. ProtQuant utilizes a parametric (ANOVA) method, for calculating differences in protein expression based on the quantitative measure ΣXcorr. Alternatively ProtQuant output can be further processed using non-parametric Monte-Carlo resampling statistics to calculate P values for differential expression. Correction for multiple testing of ANOVA and resampling P values is done using Benjamini and Hochberg's method. The results of these statistical analyses are then combined into a single output file containing a comprehensive protein list with probabilities and differential expression analysis, associated P values, and resampling statistics. CONCLUSION: For biologists carrying out proteomics by mass spectrometry, our workflow facilitates automated, easy to use analyses of Bioworks (3.2 or later versions) data. All the methods used in the workflow are peer-reviewed and as such the results of our workflow are compliant with proteomic data submission guidelines to public proteomic data repositories including PRIDE. Our workflow is a necessary intermediate step that is required to link proteomics data to biological knowledge for generating testable hypotheses.
format	Online Article Text
id	pubmed-3226188
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-32261882011-11-30 An automated proteomic data analysis workflow for mass spectrometry Pendarvis, Ken Kumar, Ranjit Burgess, Shane C Nanduri, Bindu BMC Bioinformatics Proceedings BACKGROUND: Mass spectrometry-based protein identification methods are fundamental to proteomics. Biological experiments are usually performed in replicates and proteomic analyses generate huge datasets which need to be integrated and quantitatively analyzed. The Sequest™ search algorithm is a commonly used algorithm for identifying peptides and proteins from two dimensional liquid chromatography electrospray ionization tandem mass spectrometry (2-D LC ESI MS(2)) data. A number of proteomic pipelines that facilitate high throughput 'post data acquisition analysis' are described in the literature. However, these pipelines need to be updated to accommodate the rapidly evolving data analysis methods. Here, we describe a proteomic data analysis pipeline that specifically addresses two main issues pertinent to protein identification and differential expression analysis: 1) estimation of the probability of peptide and protein identifications and 2) non-parametric statistics for protein differential expression analysis. Our proteomic analysis workflow analyzes replicate datasets from a single experimental paradigm to generate a list of identified proteins with their probabilities and significant changes in protein expression using parametric and non-parametric statistics. RESULTS: The input for our workflow is Bioworks™ 3.2 Sequest (or a later version, including cluster) output in XML format. We use a decoy database approach to assign probability to peptide identifications. The user has the option to select "quality thresholds" on peptide identifications based on the P value. We also estimate probability for protein identification. Proteins identified with peptides at a user-specified threshold value from biological experiments are grouped as either control or treatment for further analysis in ProtQuant. ProtQuant utilizes a parametric (ANOVA) method, for calculating differences in protein expression based on the quantitative measure ΣXcorr. Alternatively ProtQuant output can be further processed using non-parametric Monte-Carlo resampling statistics to calculate P values for differential expression. Correction for multiple testing of ANOVA and resampling P values is done using Benjamini and Hochberg's method. The results of these statistical analyses are then combined into a single output file containing a comprehensive protein list with probabilities and differential expression analysis, associated P values, and resampling statistics. CONCLUSION: For biologists carrying out proteomics by mass spectrometry, our workflow facilitates automated, easy to use analyses of Bioworks (3.2 or later versions) data. All the methods used in the workflow are peer-reviewed and as such the results of our workflow are compliant with proteomic data submission guidelines to public proteomic data repositories including PRIDE. Our workflow is a necessary intermediate step that is required to link proteomics data to biological knowledge for generating testable hypotheses. BioMed Central 2009-10-08 /pmc/articles/PMC3226188/ /pubmed/19811682 http://dx.doi.org/10.1186/1471-2105-10-S11-S17 Text en Copyright ©2009 Pendarvis et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Pendarvis, Ken Kumar, Ranjit Burgess, Shane C Nanduri, Bindu An automated proteomic data analysis workflow for mass spectrometry
title	An automated proteomic data analysis workflow for mass spectrometry
title_full	An automated proteomic data analysis workflow for mass spectrometry
title_fullStr	An automated proteomic data analysis workflow for mass spectrometry
title_full_unstemmed	An automated proteomic data analysis workflow for mass spectrometry
title_short	An automated proteomic data analysis workflow for mass spectrometry
title_sort	automated proteomic data analysis workflow for mass spectrometry
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3226188/ https://www.ncbi.nlm.nih.gov/pubmed/19811682 http://dx.doi.org/10.1186/1471-2105-10-S11-S17
work_keys_str_mv	AT pendarvisken anautomatedproteomicdataanalysisworkflowformassspectrometry AT kumarranjit anautomatedproteomicdataanalysisworkflowformassspectrometry AT burgessshanec anautomatedproteomicdataanalysisworkflowformassspectrometry AT nanduribindu anautomatedproteomicdataanalysisworkflowformassspectrometry AT pendarvisken automatedproteomicdataanalysisworkflowformassspectrometry AT kumarranjit automatedproteomicdataanalysisworkflowformassspectrometry AT burgessshanec automatedproteomicdataanalysisworkflowformassspectrometry AT nanduribindu automatedproteomicdataanalysisworkflowformassspectrometry

An automated proteomic data analysis workflow for mass spectrometry

Ejemplares similares