Cargando…

A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery

A core component in translational cancer research is biomarker discovery using gene expression profiling for clinical tumors. This is often based on cell line experiments; one population is sampled for inference in another. We disclose a semisupervised workflow focusing on binary (switch-like, bimod...

Descripción completa

Detalles Bibliográficos
Autores principales: Karrila, Seppo, Lee, Julian Hock Ean, Tucker-Kellogg, Greg
Formato: Texto
Lenguaje:English
Publicado: Libertas Academica 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3091411/
https://www.ncbi.nlm.nih.gov/pubmed/21584264
http://dx.doi.org/10.4137/CIN.S6868
_version_ 1782203253612609536
author Karrila, Seppo
Lee, Julian Hock Ean
Tucker-Kellogg, Greg
author_facet Karrila, Seppo
Lee, Julian Hock Ean
Tucker-Kellogg, Greg
author_sort Karrila, Seppo
collection PubMed
description A core component in translational cancer research is biomarker discovery using gene expression profiling for clinical tumors. This is often based on cell line experiments; one population is sampled for inference in another. We disclose a semisupervised workflow focusing on binary (switch-like, bimodal) informative genes that are likely cancer relevant, to mitigate this non-statistical problem. Outlier detection is a key enabling technology of the workflow, and aids in identifying the focus genes. We compare outlier detection techniques MOST, LSOSS, COPA, ORT, OS, and t-test, using a publicly available NSCLC dataset. Removing genes with Gaussian distribution is computationally efficient and matches MOST particularly well, while also COPA and OS pick prognostically relevant genes in their top ranks. Also our stability assessment is in favour of both MOST and COPA; the latter does not pair well with prefiltering for non-Gaussianity, but can handle data sets lacking non-cancer cases. We provide R code for replicating our approach or extending it.
format Text
id pubmed-3091411
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-30914112011-05-16 A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery Karrila, Seppo Lee, Julian Hock Ean Tucker-Kellogg, Greg Cancer Inform Original Research A core component in translational cancer research is biomarker discovery using gene expression profiling for clinical tumors. This is often based on cell line experiments; one population is sampled for inference in another. We disclose a semisupervised workflow focusing on binary (switch-like, bimodal) informative genes that are likely cancer relevant, to mitigate this non-statistical problem. Outlier detection is a key enabling technology of the workflow, and aids in identifying the focus genes. We compare outlier detection techniques MOST, LSOSS, COPA, ORT, OS, and t-test, using a publicly available NSCLC dataset. Removing genes with Gaussian distribution is computationally efficient and matches MOST particularly well, while also COPA and OS pick prognostically relevant genes in their top ranks. Also our stability assessment is in favour of both MOST and COPA; the latter does not pair well with prefiltering for non-Gaussianity, but can handle data sets lacking non-cancer cases. We provide R code for replicating our approach or extending it. Libertas Academica 2011-04-18 /pmc/articles/PMC3091411/ /pubmed/21584264 http://dx.doi.org/10.4137/CIN.S6868 Text en © the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited.
spellingShingle Original Research
Karrila, Seppo
Lee, Julian Hock Ean
Tucker-Kellogg, Greg
A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery
title A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery
title_full A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery
title_fullStr A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery
title_full_unstemmed A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery
title_short A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery
title_sort comparison of methods for data-driven cancer outlier discovery, and an application scheme to semisupervised predictive biomarker discovery
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3091411/
https://www.ncbi.nlm.nih.gov/pubmed/21584264
http://dx.doi.org/10.4137/CIN.S6868
work_keys_str_mv AT karrilaseppo acomparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery
AT leejulianhockean acomparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery
AT tuckerkellogggreg acomparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery
AT karrilaseppo comparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery
AT leejulianhockean comparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery
AT tuckerkellogggreg comparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery