Cargando…

A Comparison of Methods for Classifying Clinical Samples Based on Proteomics Data: A Case Study for Statistical and Machine Learning Approaches

The discovery of protein variation is an important strategy in disease diagnosis within the biological sciences. The current benchmark for elucidating information from multiple biological variables is the so called “omics” disciplines of the biological sciences. Such variability is uncovered by impl...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sampson, Dayle L., Parker, Tony J., Upton, Zee, Hurst, Cameron P.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3182169/ https://www.ncbi.nlm.nih.gov/pubmed/21969867 http://dx.doi.org/10.1371/journal.pone.0024973

_version_	1782212876223643648
author	Sampson, Dayle L. Parker, Tony J. Upton, Zee Hurst, Cameron P.
author_facet	Sampson, Dayle L. Parker, Tony J. Upton, Zee Hurst, Cameron P.
author_sort	Sampson, Dayle L.
collection	PubMed
description	The discovery of protein variation is an important strategy in disease diagnosis within the biological sciences. The current benchmark for elucidating information from multiple biological variables is the so called “omics” disciplines of the biological sciences. Such variability is uncovered by implementation of multivariable data mining techniques which come under two primary categories, machine learning strategies and statistical based approaches. Typically proteomic studies can produce hundreds or thousands of variables, p, per observation, n, depending on the analytical platform or method employed to generate the data. Many classification methods are limited by an n≪p constraint, and as such, require pre-treatment to reduce the dimensionality prior to classification. Recently machine learning techniques have gained popularity in the field for their ability to successfully classify unknown samples. One limitation of such methods is the lack of a functional model allowing meaningful interpretation of results in terms of the features used for classification. This is a problem that might be solved using a statistical model-based approach where not only is the importance of the individual protein explicit, they are combined into a readily interpretable classification rule without relying on a black box approach. Here we incorporate statistical dimension reduction techniques Partial Least Squares (PLS) and Principal Components Analysis (PCA) followed by both statistical and machine learning classification methods, and compared them to a popular machine learning technique, Support Vector Machines (SVM). Both PLS and SVM demonstrate strong utility for proteomic classification problems.
format	Online Article Text
id	pubmed-3182169
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-31821692011-10-03 A Comparison of Methods for Classifying Clinical Samples Based on Proteomics Data: A Case Study for Statistical and Machine Learning Approaches Sampson, Dayle L. Parker, Tony J. Upton, Zee Hurst, Cameron P. PLoS One Research Article The discovery of protein variation is an important strategy in disease diagnosis within the biological sciences. The current benchmark for elucidating information from multiple biological variables is the so called “omics” disciplines of the biological sciences. Such variability is uncovered by implementation of multivariable data mining techniques which come under two primary categories, machine learning strategies and statistical based approaches. Typically proteomic studies can produce hundreds or thousands of variables, p, per observation, n, depending on the analytical platform or method employed to generate the data. Many classification methods are limited by an n≪p constraint, and as such, require pre-treatment to reduce the dimensionality prior to classification. Recently machine learning techniques have gained popularity in the field for their ability to successfully classify unknown samples. One limitation of such methods is the lack of a functional model allowing meaningful interpretation of results in terms of the features used for classification. This is a problem that might be solved using a statistical model-based approach where not only is the importance of the individual protein explicit, they are combined into a readily interpretable classification rule without relying on a black box approach. Here we incorporate statistical dimension reduction techniques Partial Least Squares (PLS) and Principal Components Analysis (PCA) followed by both statistical and machine learning classification methods, and compared them to a popular machine learning technique, Support Vector Machines (SVM). Both PLS and SVM demonstrate strong utility for proteomic classification problems. Public Library of Science 2011-09-28 /pmc/articles/PMC3182169/ /pubmed/21969867 http://dx.doi.org/10.1371/journal.pone.0024973 Text en Sampson et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Sampson, Dayle L. Parker, Tony J. Upton, Zee Hurst, Cameron P. A Comparison of Methods for Classifying Clinical Samples Based on Proteomics Data: A Case Study for Statistical and Machine Learning Approaches
title	A Comparison of Methods for Classifying Clinical Samples Based on Proteomics Data: A Case Study for Statistical and Machine Learning Approaches
title_full	A Comparison of Methods for Classifying Clinical Samples Based on Proteomics Data: A Case Study for Statistical and Machine Learning Approaches
title_fullStr	A Comparison of Methods for Classifying Clinical Samples Based on Proteomics Data: A Case Study for Statistical and Machine Learning Approaches
title_full_unstemmed	A Comparison of Methods for Classifying Clinical Samples Based on Proteomics Data: A Case Study for Statistical and Machine Learning Approaches
title_short	A Comparison of Methods for Classifying Clinical Samples Based on Proteomics Data: A Case Study for Statistical and Machine Learning Approaches
title_sort	comparison of methods for classifying clinical samples based on proteomics data: a case study for statistical and machine learning approaches
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3182169/ https://www.ncbi.nlm.nih.gov/pubmed/21969867 http://dx.doi.org/10.1371/journal.pone.0024973
work_keys_str_mv	AT sampsondaylel acomparisonofmethodsforclassifyingclinicalsamplesbasedonproteomicsdataacasestudyforstatisticalandmachinelearningapproaches AT parkertonyj acomparisonofmethodsforclassifyingclinicalsamplesbasedonproteomicsdataacasestudyforstatisticalandmachinelearningapproaches AT uptonzee acomparisonofmethodsforclassifyingclinicalsamplesbasedonproteomicsdataacasestudyforstatisticalandmachinelearningapproaches AT hurstcameronp acomparisonofmethodsforclassifyingclinicalsamplesbasedonproteomicsdataacasestudyforstatisticalandmachinelearningapproaches AT sampsondaylel comparisonofmethodsforclassifyingclinicalsamplesbasedonproteomicsdataacasestudyforstatisticalandmachinelearningapproaches AT parkertonyj comparisonofmethodsforclassifyingclinicalsamplesbasedonproteomicsdataacasestudyforstatisticalandmachinelearningapproaches AT uptonzee comparisonofmethodsforclassifyingclinicalsamplesbasedonproteomicsdataacasestudyforstatisticalandmachinelearningapproaches AT hurstcameronp comparisonofmethodsforclassifyingclinicalsamplesbasedonproteomicsdataacasestudyforstatisticalandmachinelearningapproaches

A Comparison of Methods for Classifying Clinical Samples Based on Proteomics Data: A Case Study for Statistical and Machine Learning Approaches

Ejemplares similares