Cargando…

Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier

BACKGROUND: Genome wide gene expression data is a rich source for the identification of gene signatures suitable for clinical purposes and a number of statistical algorithms have been described for both identification and evaluation of such signatures. Some employed algorithms are fairly complex and...

Descripción completa

Detalles Bibliográficos
Autores principales: Lauss, Martin, Frigyesi, Attila, Ryden, Tobias, Höglund, Mattias
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2966465/
https://www.ncbi.nlm.nih.gov/pubmed/20925936
http://dx.doi.org/10.1186/1471-2407-10-532
_version_ 1782189588873216000
author Lauss, Martin
Frigyesi, Attila
Ryden, Tobias
Höglund, Mattias
author_facet Lauss, Martin
Frigyesi, Attila
Ryden, Tobias
Höglund, Mattias
author_sort Lauss, Martin
collection PubMed
description BACKGROUND: Genome wide gene expression data is a rich source for the identification of gene signatures suitable for clinical purposes and a number of statistical algorithms have been described for both identification and evaluation of such signatures. Some employed algorithms are fairly complex and hence sensitive to over-fitting whereas others are more simple and straight forward. Here we present a new type of simple algorithm based on ROC analysis and the use of metagenes that we believe will be a good complement to existing algorithms. RESULTS: The basis for the proposed approach is the use of metagenes, instead of collections of individual genes, and a feature selection using AUC values obtained by ROC analysis. Each gene in a data set is assigned an AUC value relative to the tumor class under investigation and the genes are ranked according to these values. Metagenes are then formed by calculating the mean expression level for an increasing number of ranked genes, and the metagene expression value that optimally discriminates tumor classes in the training set is used for classification of new samples. The performance of the metagene is then evaluated using LOOCV and balanced accuracies. CONCLUSIONS: We show that the simple uni-variate gene expression average algorithm performs as well as several alternative algorithms such as discriminant analysis and the more complex approaches such as SVM and neural networks. The R package rocc is freely available at http://cran.r-project.org/web/packages/rocc/index.html.
format Text
id pubmed-2966465
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29664652010-10-30 Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier Lauss, Martin Frigyesi, Attila Ryden, Tobias Höglund, Mattias BMC Cancer Software BACKGROUND: Genome wide gene expression data is a rich source for the identification of gene signatures suitable for clinical purposes and a number of statistical algorithms have been described for both identification and evaluation of such signatures. Some employed algorithms are fairly complex and hence sensitive to over-fitting whereas others are more simple and straight forward. Here we present a new type of simple algorithm based on ROC analysis and the use of metagenes that we believe will be a good complement to existing algorithms. RESULTS: The basis for the proposed approach is the use of metagenes, instead of collections of individual genes, and a feature selection using AUC values obtained by ROC analysis. Each gene in a data set is assigned an AUC value relative to the tumor class under investigation and the genes are ranked according to these values. Metagenes are then formed by calculating the mean expression level for an increasing number of ranked genes, and the metagene expression value that optimally discriminates tumor classes in the training set is used for classification of new samples. The performance of the metagene is then evaluated using LOOCV and balanced accuracies. CONCLUSIONS: We show that the simple uni-variate gene expression average algorithm performs as well as several alternative algorithms such as discriminant analysis and the more complex approaches such as SVM and neural networks. The R package rocc is freely available at http://cran.r-project.org/web/packages/rocc/index.html. BioMed Central 2010-10-06 /pmc/articles/PMC2966465/ /pubmed/20925936 http://dx.doi.org/10.1186/1471-2407-10-532 Text en Copyright ©2010 Lauss et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Lauss, Martin
Frigyesi, Attila
Ryden, Tobias
Höglund, Mattias
Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier
title Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier
title_full Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier
title_fullStr Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier
title_full_unstemmed Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier
title_short Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier
title_sort robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2966465/
https://www.ncbi.nlm.nih.gov/pubmed/20925936
http://dx.doi.org/10.1186/1471-2407-10-532
work_keys_str_mv AT laussmartin robustassignmentofcancersubtypesfromexpressiondatausingaunivariategeneexpressionaverageasclassifier
AT frigyesiattila robustassignmentofcancersubtypesfromexpressiondatausingaunivariategeneexpressionaverageasclassifier
AT rydentobias robustassignmentofcancersubtypesfromexpressiondatausingaunivariategeneexpressionaverageasclassifier
AT hoglundmattias robustassignmentofcancersubtypesfromexpressiondatausingaunivariategeneexpressionaverageasclassifier