Cargando…

Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery

BACKGROUND: Mass spectrometry-based biomarker discovery has long been hampered by the difficulty in reconciling lists of discriminatory peaks identified by different laboratories for the same diseases studied. We describe a multi-statistical analysis procedure that combines several independent compu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tan, Niclas C, Fisher, Wayne G, Rosenblatt, Kevin P, Garner, Harold R
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2688007/ https://www.ncbi.nlm.nih.gov/pubmed/19442303 http://dx.doi.org/10.1186/1471-2105-10-144

_version_	1782167638783295488
author	Tan, Niclas C Fisher, Wayne G Rosenblatt, Kevin P Garner, Harold R
author_facet	Tan, Niclas C Fisher, Wayne G Rosenblatt, Kevin P Garner, Harold R
author_sort	Tan, Niclas C
collection	PubMed
description	BACKGROUND: Mass spectrometry-based biomarker discovery has long been hampered by the difficulty in reconciling lists of discriminatory peaks identified by different laboratories for the same diseases studied. We describe a multi-statistical analysis procedure that combines several independent computational methods. This approach capitalizes on the strengths of each to analyze the same high-resolution mass spectral data set to discover consensus differential mass peaks that should be robust biomarkers for distinguishing between disease states. RESULTS: The proposed methodology was applied to a pilot narcolepsy study using logistic regression, hierarchical clustering, t-test, and CART. Consensus, differential mass peaks with high predictive power were identified across three of the four statistical platforms. Based on the diagnostic accuracy measures investigated, the performance of the consensus-peak model was a compromise between logistic regression and CART, which produced better models than hierarchical clustering and t-test. However, consensus peaks confer a higher level of confidence in their ability to distinguish between disease states since they do not represent peaks that are a result of biases to a particular statistical algorithm. Instead, they were selected as differential across differing data distribution assumptions, demonstrating their true discriminatory potential. CONCLUSION: The methodology described here is applicable to any high-resolution MALDI mass spectrometry-derived data set with minimal mass drift which is essential for peak-to-peak comparison studies. Four statistical approaches with differing data distribution assumptions were applied to the same raw data set to obtain consensus peaks that were found to be statistically differential between the two groups compared. These consensus peaks demonstrated high diagnostic accuracy when used to form a predictive model as evaluated by receiver operating characteristics curve analysis. They should demonstrate a higher discriminatory ability as they are not biased to a particular algorithm. Thus, they are prime candidates for downstream identification and validation efforts.
format	Text
id	pubmed-2688007
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26880072009-05-29 Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery Tan, Niclas C Fisher, Wayne G Rosenblatt, Kevin P Garner, Harold R BMC Bioinformatics Research Article BACKGROUND: Mass spectrometry-based biomarker discovery has long been hampered by the difficulty in reconciling lists of discriminatory peaks identified by different laboratories for the same diseases studied. We describe a multi-statistical analysis procedure that combines several independent computational methods. This approach capitalizes on the strengths of each to analyze the same high-resolution mass spectral data set to discover consensus differential mass peaks that should be robust biomarkers for distinguishing between disease states. RESULTS: The proposed methodology was applied to a pilot narcolepsy study using logistic regression, hierarchical clustering, t-test, and CART. Consensus, differential mass peaks with high predictive power were identified across three of the four statistical platforms. Based on the diagnostic accuracy measures investigated, the performance of the consensus-peak model was a compromise between logistic regression and CART, which produced better models than hierarchical clustering and t-test. However, consensus peaks confer a higher level of confidence in their ability to distinguish between disease states since they do not represent peaks that are a result of biases to a particular statistical algorithm. Instead, they were selected as differential across differing data distribution assumptions, demonstrating their true discriminatory potential. CONCLUSION: The methodology described here is applicable to any high-resolution MALDI mass spectrometry-derived data set with minimal mass drift which is essential for peak-to-peak comparison studies. Four statistical approaches with differing data distribution assumptions were applied to the same raw data set to obtain consensus peaks that were found to be statistically differential between the two groups compared. These consensus peaks demonstrated high diagnostic accuracy when used to form a predictive model as evaluated by receiver operating characteristics curve analysis. They should demonstrate a higher discriminatory ability as they are not biased to a particular algorithm. Thus, they are prime candidates for downstream identification and validation efforts. BioMed Central 2009-05-14 /pmc/articles/PMC2688007/ /pubmed/19442303 http://dx.doi.org/10.1186/1471-2105-10-144 Text en Copyright © 2009 Tan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Tan, Niclas C Fisher, Wayne G Rosenblatt, Kevin P Garner, Harold R Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery
title	Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery
title_full	Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery
title_fullStr	Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery
title_full_unstemmed	Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery
title_short	Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery
title_sort	application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2688007/ https://www.ncbi.nlm.nih.gov/pubmed/19442303 http://dx.doi.org/10.1186/1471-2105-10-144
work_keys_str_mv	AT tanniclasc applicationofmultiplestatisticalteststoenhancemassspectrometrybasedbiomarkerdiscovery AT fisherwayneg applicationofmultiplestatisticalteststoenhancemassspectrometrybasedbiomarkerdiscovery AT rosenblattkevinp applicationofmultiplestatisticalteststoenhancemassspectrometrybasedbiomarkerdiscovery AT garnerharoldr applicationofmultiplestatisticalteststoenhancemassspectrometrybasedbiomarkerdiscovery

Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery

Ejemplares similares