Cargando…

Classification across gene expression microarray studies

BACKGROUND: The increasing number of gene expression microarray studies represents an important resource in biomedical research. As a result, gene expression based diagnosis has entered clinical practice for patient stratification in breast cancer. However, the integration and combined analysis of m...

Descripción completa

Detalles Bibliográficos
Autores principales: Buness, Andreas, Ruschhaupt, Markus, Kuner, Ruprecht, Tresch, Achim
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2811711/
https://www.ncbi.nlm.nih.gov/pubmed/20042109
http://dx.doi.org/10.1186/1471-2105-10-453
_version_ 1782176785818976256
author Buness, Andreas
Ruschhaupt, Markus
Kuner, Ruprecht
Tresch, Achim
author_facet Buness, Andreas
Ruschhaupt, Markus
Kuner, Ruprecht
Tresch, Achim
author_sort Buness, Andreas
collection PubMed
description BACKGROUND: The increasing number of gene expression microarray studies represents an important resource in biomedical research. As a result, gene expression based diagnosis has entered clinical practice for patient stratification in breast cancer. However, the integration and combined analysis of microarray studies remains still a challenge. We assessed the potential benefit of data integration on the classification accuracy and systematically evaluated the generalization performance of selected methods on four breast cancer studies comprising almost 1000 independent samples. To this end, we introduced an evaluation framework which aims to establish good statistical practice and a graphical way to monitor differences. The classification goal was to correctly predict estrogen receptor status (negative/positive) and histological grade (low/high) of each tumor sample in an independent study which was not used for the training. For the classification we chose support vector machines (SVM), predictive analysis of microarrays (PAM), random forest (RF) and k-top scoring pairs (kTSP). Guided by considerations relevant for classification across studies we developed a generalization of kTSP which we evaluated in addition. Our derived version (DV) aims to improve the robustness of the intrinsic invariance of kTSP with respect to technologies and preprocessing. RESULTS: For each individual study the generalization error was benchmarked via complete cross-validation and was found to be similar for all classification methods. The misclassification rates were substantially higher in classification across studies, when each single study was used as an independent test set while all remaining studies were combined for the training of the classifier. However, with increasing number of independent microarray studies used in the training, the overall classification performance improved. DV performed better than the average and showed slightly less variance. In particular, the better predictive results of DV in across platform classification indicate higher robustness of the classifier when trained on single channel data and applied to gene expression ratios. CONCLUSIONS: We present a systematic evaluation of strategies for the integration of independent microarray studies in a classification task. Our findings in across studies classification may guide further research aiming on the construction of more robust and reliable methods for stratification and diagnosis in clinical practice.
format Text
id pubmed-2811711
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28117112010-01-27 Classification across gene expression microarray studies Buness, Andreas Ruschhaupt, Markus Kuner, Ruprecht Tresch, Achim BMC Bioinformatics Research article BACKGROUND: The increasing number of gene expression microarray studies represents an important resource in biomedical research. As a result, gene expression based diagnosis has entered clinical practice for patient stratification in breast cancer. However, the integration and combined analysis of microarray studies remains still a challenge. We assessed the potential benefit of data integration on the classification accuracy and systematically evaluated the generalization performance of selected methods on four breast cancer studies comprising almost 1000 independent samples. To this end, we introduced an evaluation framework which aims to establish good statistical practice and a graphical way to monitor differences. The classification goal was to correctly predict estrogen receptor status (negative/positive) and histological grade (low/high) of each tumor sample in an independent study which was not used for the training. For the classification we chose support vector machines (SVM), predictive analysis of microarrays (PAM), random forest (RF) and k-top scoring pairs (kTSP). Guided by considerations relevant for classification across studies we developed a generalization of kTSP which we evaluated in addition. Our derived version (DV) aims to improve the robustness of the intrinsic invariance of kTSP with respect to technologies and preprocessing. RESULTS: For each individual study the generalization error was benchmarked via complete cross-validation and was found to be similar for all classification methods. The misclassification rates were substantially higher in classification across studies, when each single study was used as an independent test set while all remaining studies were combined for the training of the classifier. However, with increasing number of independent microarray studies used in the training, the overall classification performance improved. DV performed better than the average and showed slightly less variance. In particular, the better predictive results of DV in across platform classification indicate higher robustness of the classifier when trained on single channel data and applied to gene expression ratios. CONCLUSIONS: We present a systematic evaluation of strategies for the integration of independent microarray studies in a classification task. Our findings in across studies classification may guide further research aiming on the construction of more robust and reliable methods for stratification and diagnosis in clinical practice. BioMed Central 2009-12-30 /pmc/articles/PMC2811711/ /pubmed/20042109 http://dx.doi.org/10.1186/1471-2105-10-453 Text en Copyright ©2009 Buness et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Buness, Andreas
Ruschhaupt, Markus
Kuner, Ruprecht
Tresch, Achim
Classification across gene expression microarray studies
title Classification across gene expression microarray studies
title_full Classification across gene expression microarray studies
title_fullStr Classification across gene expression microarray studies
title_full_unstemmed Classification across gene expression microarray studies
title_short Classification across gene expression microarray studies
title_sort classification across gene expression microarray studies
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2811711/
https://www.ncbi.nlm.nih.gov/pubmed/20042109
http://dx.doi.org/10.1186/1471-2105-10-453
work_keys_str_mv AT bunessandreas classificationacrossgeneexpressionmicroarraystudies
AT ruschhauptmarkus classificationacrossgeneexpressionmicroarraystudies
AT kunerruprecht classificationacrossgeneexpressionmicroarraystudies
AT treschachim classificationacrossgeneexpressionmicroarraystudies