Cargando…

A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery

A core component in translational cancer research is biomarker discovery using gene expression profiling for clinical tumors. This is often based on cell line experiments; one population is sampled for inference in another. We disclose a semisupervised workflow focusing on binary (switch-like, bimod...

Descripción completa

Detalles Bibliográficos
Autores principales:	Karrila, Seppo, Lee, Julian Hock Ean, Tucker-Kellogg, Greg
Formato:	Texto
Lenguaje:	English
Publicado:	Libertas Academica 2011
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3091411/ https://www.ncbi.nlm.nih.gov/pubmed/21584264 http://dx.doi.org/10.4137/CIN.S6868

_version_	1782203253612609536
author	Karrila, Seppo Lee, Julian Hock Ean Tucker-Kellogg, Greg
author_facet	Karrila, Seppo Lee, Julian Hock Ean Tucker-Kellogg, Greg
author_sort	Karrila, Seppo
collection	PubMed
description	A core component in translational cancer research is biomarker discovery using gene expression profiling for clinical tumors. This is often based on cell line experiments; one population is sampled for inference in another. We disclose a semisupervised workflow focusing on binary (switch-like, bimodal) informative genes that are likely cancer relevant, to mitigate this non-statistical problem. Outlier detection is a key enabling technology of the workflow, and aids in identifying the focus genes. We compare outlier detection techniques MOST, LSOSS, COPA, ORT, OS, and t-test, using a publicly available NSCLC dataset. Removing genes with Gaussian distribution is computationally efficient and matches MOST particularly well, while also COPA and OS pick prognostically relevant genes in their top ranks. Also our stability assessment is in favour of both MOST and COPA; the latter does not pair well with prefiltering for non-Gaussianity, but can handle data sets lacking non-cancer cases. We provide R code for replicating our approach or extending it.
format	Text
id	pubmed-3091411
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Libertas Academica
record_format	MEDLINE/PubMed
spelling	pubmed-30914112011-05-16 A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery Karrila, Seppo Lee, Julian Hock Ean Tucker-Kellogg, Greg Cancer Inform Original Research A core component in translational cancer research is biomarker discovery using gene expression profiling for clinical tumors. This is often based on cell line experiments; one population is sampled for inference in another. We disclose a semisupervised workflow focusing on binary (switch-like, bimodal) informative genes that are likely cancer relevant, to mitigate this non-statistical problem. Outlier detection is a key enabling technology of the workflow, and aids in identifying the focus genes. We compare outlier detection techniques MOST, LSOSS, COPA, ORT, OS, and t-test, using a publicly available NSCLC dataset. Removing genes with Gaussian distribution is computationally efficient and matches MOST particularly well, while also COPA and OS pick prognostically relevant genes in their top ranks. Also our stability assessment is in favour of both MOST and COPA; the latter does not pair well with prefiltering for non-Gaussianity, but can handle data sets lacking non-cancer cases. We provide R code for replicating our approach or extending it. Libertas Academica 2011-04-18 /pmc/articles/PMC3091411/ /pubmed/21584264 http://dx.doi.org/10.4137/CIN.S6868 Text en © the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited.
spellingShingle	Original Research Karrila, Seppo Lee, Julian Hock Ean Tucker-Kellogg, Greg A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery
title	A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery
title_full	A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery
title_fullStr	A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery
title_full_unstemmed	A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery
title_short	A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery
title_sort	comparison of methods for data-driven cancer outlier discovery, and an application scheme to semisupervised predictive biomarker discovery
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3091411/ https://www.ncbi.nlm.nih.gov/pubmed/21584264 http://dx.doi.org/10.4137/CIN.S6868
work_keys_str_mv	AT karrilaseppo acomparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery AT leejulianhockean acomparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery AT tuckerkellogggreg acomparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery AT karrilaseppo comparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery AT leejulianhockean comparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery AT tuckerkellogggreg comparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery

A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery

Ejemplares similares