Cargando…

Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets

BACKGROUND: Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. Experimental design of mass-spectrometry studies has come under close scrutiny and the importance of strict protocols for sample collection is now understood. However, the quest...

Descripción completa

Detalles Bibliográficos
Autores principales: Wegdam, Wouter, Moerland, Perry D, Buist, Marrije R, van Themaat, Emiel Ver Loren, Bleijlevens, Boris, Hoefsloot, Huub CJ, de Koster, Chris G, Aerts, Johannes MFG
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2689848/
https://www.ncbi.nlm.nih.gov/pubmed/19442271
http://dx.doi.org/10.1186/1477-5956-7-19
_version_ 1782167818017439744
author Wegdam, Wouter
Moerland, Perry D
Buist, Marrije R
van Themaat, Emiel Ver Loren
Bleijlevens, Boris
Hoefsloot, Huub CJ
de Koster, Chris G
Aerts, Johannes MFG
author_facet Wegdam, Wouter
Moerland, Perry D
Buist, Marrije R
van Themaat, Emiel Ver Loren
Bleijlevens, Boris
Hoefsloot, Huub CJ
de Koster, Chris G
Aerts, Johannes MFG
author_sort Wegdam, Wouter
collection PubMed
description BACKGROUND: Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. Experimental design of mass-spectrometry studies has come under close scrutiny and the importance of strict protocols for sample collection is now understood. However, the question of how best to process the large quantities of data generated is still unanswered. Main challenges for the analysis are the choice of proper pre-processing and classification methods. While these two issues have been investigated in isolation, we propose to use the classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. RESULTS: Two in-house generated clinical SELDI-TOF MS datasets are used in this study as an example of high throughput mass-spectrometry data. We perform a systematic comparison of two commonly used pre-processing methods as implemented in Ciphergen ProteinChip Software and in the Cromwell package. With respect to reproducibility, Ciphergen and Cromwell pre-processing are largely comparable. We find that the overlap between peaks detected by either Ciphergen ProteinChip Software or Cromwell is large. This is especially the case for the more stringent peak detection settings. Moreover, similarity of the estimated intensities between matched peaks is high. We evaluate the pre-processing methods using five different classification methods. Classification is done in a double cross-validation protocol using repeated random sampling to obtain an unbiased estimate of classification accuracy. No pre-processing method significantly outperforms the other for all peak detection settings evaluated. CONCLUSION: We use classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Both pre-processing methods lead to similar classification results on an ovarian cancer and a Gaucher disease dataset. However, the settings for pre-processing parameters lead to large differences in classification accuracy and are therefore of crucial importance. We advocate the evaluation over a range of parameter settings when comparing pre-processing methods. Our analysis also demonstrates that reliable classification results can be obtained with a combination of strict sample handling and a well-defined classification protocol on clinical samples.
format Text
id pubmed-2689848
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26898482009-06-03 Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets Wegdam, Wouter Moerland, Perry D Buist, Marrije R van Themaat, Emiel Ver Loren Bleijlevens, Boris Hoefsloot, Huub CJ de Koster, Chris G Aerts, Johannes MFG Proteome Sci Research BACKGROUND: Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. Experimental design of mass-spectrometry studies has come under close scrutiny and the importance of strict protocols for sample collection is now understood. However, the question of how best to process the large quantities of data generated is still unanswered. Main challenges for the analysis are the choice of proper pre-processing and classification methods. While these two issues have been investigated in isolation, we propose to use the classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. RESULTS: Two in-house generated clinical SELDI-TOF MS datasets are used in this study as an example of high throughput mass-spectrometry data. We perform a systematic comparison of two commonly used pre-processing methods as implemented in Ciphergen ProteinChip Software and in the Cromwell package. With respect to reproducibility, Ciphergen and Cromwell pre-processing are largely comparable. We find that the overlap between peaks detected by either Ciphergen ProteinChip Software or Cromwell is large. This is especially the case for the more stringent peak detection settings. Moreover, similarity of the estimated intensities between matched peaks is high. We evaluate the pre-processing methods using five different classification methods. Classification is done in a double cross-validation protocol using repeated random sampling to obtain an unbiased estimate of classification accuracy. No pre-processing method significantly outperforms the other for all peak detection settings evaluated. CONCLUSION: We use classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Both pre-processing methods lead to similar classification results on an ovarian cancer and a Gaucher disease dataset. However, the settings for pre-processing parameters lead to large differences in classification accuracy and are therefore of crucial importance. We advocate the evaluation over a range of parameter settings when comparing pre-processing methods. Our analysis also demonstrates that reliable classification results can be obtained with a combination of strict sample handling and a well-defined classification protocol on clinical samples. BioMed Central 2009-05-14 /pmc/articles/PMC2689848/ /pubmed/19442271 http://dx.doi.org/10.1186/1477-5956-7-19 Text en Copyright © 2009 Wegdam et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Wegdam, Wouter
Moerland, Perry D
Buist, Marrije R
van Themaat, Emiel Ver Loren
Bleijlevens, Boris
Hoefsloot, Huub CJ
de Koster, Chris G
Aerts, Johannes MFG
Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets
title Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets
title_full Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets
title_fullStr Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets
title_full_unstemmed Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets
title_short Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets
title_sort classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2689848/
https://www.ncbi.nlm.nih.gov/pubmed/19442271
http://dx.doi.org/10.1186/1477-5956-7-19
work_keys_str_mv AT wegdamwouter classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets
AT moerlandperryd classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets
AT buistmarrijer classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets
AT vanthemaatemielverloren classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets
AT bleijlevensboris classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets
AT hoefsloothuubcj classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets
AT dekosterchrisg classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets
AT aertsjohannesmfg classificationbasedcomparisonofpreprocessingmethodsforinterpretationofmassspectrometrygeneratedclinicaldatasets