Cargando…

Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets

BACKGROUND: Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. Experimental design of mass-spectrometry studies has come under close scrutiny and the importance of strict protocols for sample collection is now understood. However, the quest...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wegdam, Wouter, Moerland, Perry D, Buist, Marrije R, van Themaat, Emiel Ver Loren, Bleijlevens, Boris, Hoefsloot, Huub CJ, de Koster, Chris G, Aerts, Johannes MFG
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2689848/ https://www.ncbi.nlm.nih.gov/pubmed/19442271 http://dx.doi.org/10.1186/1477-5956-7-19

_version_	1782167818017439744
author	Wegdam, Wouter Moerland, Perry D Buist, Marrije R van Themaat, Emiel Ver Loren Bleijlevens, Boris Hoefsloot, Huub CJ de Koster, Chris G Aerts, Johannes MFG
author_facet	Wegdam, Wouter Moerland, Perry D Buist, Marrije R van Themaat, Emiel Ver Loren Bleijlevens, Boris Hoefsloot, Huub CJ de Koster, Chris G Aerts, Johannes MFG
author_sort	Wegdam, Wouter
collection	PubMed
description	BACKGROUND: Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. Experimental design of mass-spectrometry studies has come under close scrutiny and the importance of strict protocols for sample collection is now understood. However, the question of how best to process the large quantities of data generated is still unanswered. Main challenges for the analysis are the choice of proper pre-processing and classification methods. While these two issues have been investigated in isolation, we propose to use the classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. RESULTS: Two in-house generated clinical SELDI-TOF MS datasets are used in this study as an example of high throughput mass-spectrometry data. We perform a systematic comparison of two commonly used pre-processing methods as implemented in Ciphergen ProteinChip Software and in the Cromwell package. With respect to reproducibility, Ciphergen and Cromwell pre-processing are largely comparable. We find that the overlap between peaks detected by either Ciphergen ProteinChip Software or Cromwell is large. This is especially the case for the more stringent peak detection settings. Moreover, similarity of the estimated intensities between matched peaks is high. We evaluate the pre-processing methods using five different classification methods. Classification is done in a double cross-validation protocol using repeated random sampling to obtain an unbiased estimate of classification accuracy. No pre-processing method significantly outperforms the other for all peak detection settings evaluated. CONCLUSION: We use classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Both pre-processing methods lead to similar classification results on an ovarian cancer and a Gaucher disease dataset. However, the settings for pre-processing parameters lead to large differences in classification accuracy and are therefore of crucial importance. We advocate the evaluation over a range of parameter settings when comparing pre-processing methods. Our analysis also demonstrates that reliable classification results can be obtained with a combination of strict sample handling and a well-defined classification protocol on clinical samples.
format	Text
id	pubmed-2689848
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26898482009-06-03 Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets Wegdam, Wouter Moerland, Perry D Buist, Marrije R van Themaat, Emiel Ver Loren Bleijlevens, Boris Hoefsloot, Huub CJ de Koster, Chris G Aerts, Johannes MFG Proteome Sci Research BACKGROUND: Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. Experimental design of mass-spectrometry studies has come under close scrutiny and the importance of strict protocols for sample collection is now understood. However, the question of how best to process the large quantities of data generated is still unanswered. Main challenges for the analysis are the choice of proper pre-processing and classification methods. While these two issues have been investigated in isolation, we propose to use the classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. RESULTS: Two in-house generated clinical SELDI-TOF MS datasets are used in this study as an example of high throughput mass-spectrometry data. We perform a systematic comparison of two commonly used pre-processing methods as implemented in Ciphergen ProteinChip Software and in the Cromwell package. With respect to reproducibility, Ciphergen and Cromwell pre-processing are largely comparable. We find that the overlap between peaks detected by either Ciphergen ProteinChip Software or Cromwell is large. This is especially the case for the more stringent peak detection settings. Moreover, similarity of the estimated intensities between matched peaks is high. We evaluate the pre-processing methods using five different classification methods. Classification is done in a double cross-validation protocol using repeated random sampling to obtain an unbiased estimate of classification accuracy. No pre-processing method significantly outperforms the other for all peak detection settings evaluated. CONCLUSION: We use classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Both pre-processing methods lead to similar classification results on an ovarian cancer and a Gaucher disease dataset. However, the settings for pre-processing parameters lead to large differences in classification accuracy and are therefore of crucial importance. We advocate the evaluation over a range of parameter settings when comparing pre-processing methods. Our analysis also demonstrates that reliable classification results can be obtained with a combination of strict sample handling and a well-defined classification protocol on clinical samples. BioMed Central 2009-05-14 /pmc/articles/PMC2689848/ /pubmed/19442271 http://dx.doi.org/10.1186/1477-5956-7-19 Text en Copyright © 2009 Wegdam et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Wegdam, Wouter Moerland, Perry D Buist, Marrije R van Themaat, Emiel Ver Loren Bleijlevens, Boris Hoefsloot, Huub CJ de Koster, Chris G Aerts, Johannes MFG Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets
title	Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets
title_full	Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets
title_fullStr	Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets
title_full_unstemmed	Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets
title_short	Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets
title_sort	classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2689848/ https://www.ncbi.nlm.nih.gov/pubmed/19442271 http://dx.doi.org/10.1186/1477-5956-7-19
work_keys_str_mv	AT wegdamwouter classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets AT moerlandperryd classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets AT buistmarrijer classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets AT vanthemaatemielverloren classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets AT bleijlevensboris classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets AT hoefsloothuubcj classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets AT dekosterchrisg classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets AT aertsjohannesmfg classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets

Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets

Ejemplares similares