Cargando…
A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery
A core component in translational cancer research is biomarker discovery using gene expression profiling for clinical tumors. This is often based on cell line experiments; one population is sampled for inference in another. We disclose a semisupervised workflow focusing on binary (switch-like, bimod...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3091411/ https://www.ncbi.nlm.nih.gov/pubmed/21584264 http://dx.doi.org/10.4137/CIN.S6868 |
_version_ | 1782203253612609536 |
---|---|
author | Karrila, Seppo Lee, Julian Hock Ean Tucker-Kellogg, Greg |
author_facet | Karrila, Seppo Lee, Julian Hock Ean Tucker-Kellogg, Greg |
author_sort | Karrila, Seppo |
collection | PubMed |
description | A core component in translational cancer research is biomarker discovery using gene expression profiling for clinical tumors. This is often based on cell line experiments; one population is sampled for inference in another. We disclose a semisupervised workflow focusing on binary (switch-like, bimodal) informative genes that are likely cancer relevant, to mitigate this non-statistical problem. Outlier detection is a key enabling technology of the workflow, and aids in identifying the focus genes. We compare outlier detection techniques MOST, LSOSS, COPA, ORT, OS, and t-test, using a publicly available NSCLC dataset. Removing genes with Gaussian distribution is computationally efficient and matches MOST particularly well, while also COPA and OS pick prognostically relevant genes in their top ranks. Also our stability assessment is in favour of both MOST and COPA; the latter does not pair well with prefiltering for non-Gaussianity, but can handle data sets lacking non-cancer cases. We provide R code for replicating our approach or extending it. |
format | Text |
id | pubmed-3091411 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Libertas Academica |
record_format | MEDLINE/PubMed |
spelling | pubmed-30914112011-05-16 A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery Karrila, Seppo Lee, Julian Hock Ean Tucker-Kellogg, Greg Cancer Inform Original Research A core component in translational cancer research is biomarker discovery using gene expression profiling for clinical tumors. This is often based on cell line experiments; one population is sampled for inference in another. We disclose a semisupervised workflow focusing on binary (switch-like, bimodal) informative genes that are likely cancer relevant, to mitigate this non-statistical problem. Outlier detection is a key enabling technology of the workflow, and aids in identifying the focus genes. We compare outlier detection techniques MOST, LSOSS, COPA, ORT, OS, and t-test, using a publicly available NSCLC dataset. Removing genes with Gaussian distribution is computationally efficient and matches MOST particularly well, while also COPA and OS pick prognostically relevant genes in their top ranks. Also our stability assessment is in favour of both MOST and COPA; the latter does not pair well with prefiltering for non-Gaussianity, but can handle data sets lacking non-cancer cases. We provide R code for replicating our approach or extending it. Libertas Academica 2011-04-18 /pmc/articles/PMC3091411/ /pubmed/21584264 http://dx.doi.org/10.4137/CIN.S6868 Text en © the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited. |
spellingShingle | Original Research Karrila, Seppo Lee, Julian Hock Ean Tucker-Kellogg, Greg A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery |
title | A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery |
title_full | A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery |
title_fullStr | A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery |
title_full_unstemmed | A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery |
title_short | A Comparison of Methods for Data-Driven Cancer Outlier Discovery, and An Application Scheme to Semisupervised Predictive Biomarker Discovery |
title_sort | comparison of methods for data-driven cancer outlier discovery, and an application scheme to semisupervised predictive biomarker discovery |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3091411/ https://www.ncbi.nlm.nih.gov/pubmed/21584264 http://dx.doi.org/10.4137/CIN.S6868 |
work_keys_str_mv | AT karrilaseppo acomparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery AT leejulianhockean acomparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery AT tuckerkellogggreg acomparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery AT karrilaseppo comparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery AT leejulianhockean comparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery AT tuckerkellogggreg comparisonofmethodsfordatadrivencanceroutlierdiscoveryandanapplicationschemetosemisupervisedpredictivebiomarkerdiscovery |