Cargando…
Classification of heterogeneous microarray data by maximum entropy kernel
BACKGROUND: There is a large amount of microarray data accumulating in public databases, providing various data waiting to be analyzed jointly. Powerful kernel-based methods are commonly used in microarray analyses with support vector machines (SVMs) to approach a wide range of classification proble...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1994960/ https://www.ncbi.nlm.nih.gov/pubmed/17651507 http://dx.doi.org/10.1186/1471-2105-8-267 |
_version_ | 1782135504019390464 |
---|---|
author | Fujibuchi, Wataru Kato, Tsuyoshi |
author_facet | Fujibuchi, Wataru Kato, Tsuyoshi |
author_sort | Fujibuchi, Wataru |
collection | PubMed |
description | BACKGROUND: There is a large amount of microarray data accumulating in public databases, providing various data waiting to be analyzed jointly. Powerful kernel-based methods are commonly used in microarray analyses with support vector machines (SVMs) to approach a wide range of classification problems. However, the standard vectorial data kernel family (linear, RBF, etc.) that takes vectorial data as input, often fails in prediction if the data come from different platforms or laboratories, due to the low gene overlaps or consistencies between the different datasets. RESULTS: We introduce a new type of kernel called maximum entropy (ME) kernel, which has no pre-defined function but is generated by kernel entropy maximization with sample distance matrices as constraints, into the field of SVM classification of microarray data. We assessed the performance of the ME kernel with three different data: heterogeneous kidney carcinoma, noise-introduced leukemia, and heterogeneous oral cavity carcinoma metastasis data. The results clearly show that the ME kernel is very robust for heterogeneous data containing missing values and high-noise, and gives higher prediction accuracies than the standard kernels, namely, linear, polynomial and RBF. CONCLUSION: The results demonstrate its utility in effectively analyzing promiscuous microarray data of rare specimens, e.g., minor diseases or species, that present difficulty in compiling homogeneous data in a single laboratory. |
format | Text |
id | pubmed-1994960 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-19949602007-09-28 Classification of heterogeneous microarray data by maximum entropy kernel Fujibuchi, Wataru Kato, Tsuyoshi BMC Bioinformatics Research Article BACKGROUND: There is a large amount of microarray data accumulating in public databases, providing various data waiting to be analyzed jointly. Powerful kernel-based methods are commonly used in microarray analyses with support vector machines (SVMs) to approach a wide range of classification problems. However, the standard vectorial data kernel family (linear, RBF, etc.) that takes vectorial data as input, often fails in prediction if the data come from different platforms or laboratories, due to the low gene overlaps or consistencies between the different datasets. RESULTS: We introduce a new type of kernel called maximum entropy (ME) kernel, which has no pre-defined function but is generated by kernel entropy maximization with sample distance matrices as constraints, into the field of SVM classification of microarray data. We assessed the performance of the ME kernel with three different data: heterogeneous kidney carcinoma, noise-introduced leukemia, and heterogeneous oral cavity carcinoma metastasis data. The results clearly show that the ME kernel is very robust for heterogeneous data containing missing values and high-noise, and gives higher prediction accuracies than the standard kernels, namely, linear, polynomial and RBF. CONCLUSION: The results demonstrate its utility in effectively analyzing promiscuous microarray data of rare specimens, e.g., minor diseases or species, that present difficulty in compiling homogeneous data in a single laboratory. BioMed Central 2007-07-26 /pmc/articles/PMC1994960/ /pubmed/17651507 http://dx.doi.org/10.1186/1471-2105-8-267 Text en Copyright © 2007 Fujibuchi and Kato; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Fujibuchi, Wataru Kato, Tsuyoshi Classification of heterogeneous microarray data by maximum entropy kernel |
title | Classification of heterogeneous microarray data by maximum entropy kernel |
title_full | Classification of heterogeneous microarray data by maximum entropy kernel |
title_fullStr | Classification of heterogeneous microarray data by maximum entropy kernel |
title_full_unstemmed | Classification of heterogeneous microarray data by maximum entropy kernel |
title_short | Classification of heterogeneous microarray data by maximum entropy kernel |
title_sort | classification of heterogeneous microarray data by maximum entropy kernel |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1994960/ https://www.ncbi.nlm.nih.gov/pubmed/17651507 http://dx.doi.org/10.1186/1471-2105-8-267 |
work_keys_str_mv | AT fujibuchiwataru classificationofheterogeneousmicroarraydatabymaximumentropykernel AT katotsuyoshi classificationofheterogeneousmicroarraydatabymaximumentropykernel |