Cargando…

Classification of heterogeneous microarray data by maximum entropy kernel

BACKGROUND: There is a large amount of microarray data accumulating in public databases, providing various data waiting to be analyzed jointly. Powerful kernel-based methods are commonly used in microarray analyses with support vector machines (SVMs) to approach a wide range of classification proble...

Descripción completa

Detalles Bibliográficos
Autores principales: Fujibuchi, Wataru, Kato, Tsuyoshi
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1994960/
https://www.ncbi.nlm.nih.gov/pubmed/17651507
http://dx.doi.org/10.1186/1471-2105-8-267
_version_ 1782135504019390464
author Fujibuchi, Wataru
Kato, Tsuyoshi
author_facet Fujibuchi, Wataru
Kato, Tsuyoshi
author_sort Fujibuchi, Wataru
collection PubMed
description BACKGROUND: There is a large amount of microarray data accumulating in public databases, providing various data waiting to be analyzed jointly. Powerful kernel-based methods are commonly used in microarray analyses with support vector machines (SVMs) to approach a wide range of classification problems. However, the standard vectorial data kernel family (linear, RBF, etc.) that takes vectorial data as input, often fails in prediction if the data come from different platforms or laboratories, due to the low gene overlaps or consistencies between the different datasets. RESULTS: We introduce a new type of kernel called maximum entropy (ME) kernel, which has no pre-defined function but is generated by kernel entropy maximization with sample distance matrices as constraints, into the field of SVM classification of microarray data. We assessed the performance of the ME kernel with three different data: heterogeneous kidney carcinoma, noise-introduced leukemia, and heterogeneous oral cavity carcinoma metastasis data. The results clearly show that the ME kernel is very robust for heterogeneous data containing missing values and high-noise, and gives higher prediction accuracies than the standard kernels, namely, linear, polynomial and RBF. CONCLUSION: The results demonstrate its utility in effectively analyzing promiscuous microarray data of rare specimens, e.g., minor diseases or species, that present difficulty in compiling homogeneous data in a single laboratory.
format Text
id pubmed-1994960
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-19949602007-09-28 Classification of heterogeneous microarray data by maximum entropy kernel Fujibuchi, Wataru Kato, Tsuyoshi BMC Bioinformatics Research Article BACKGROUND: There is a large amount of microarray data accumulating in public databases, providing various data waiting to be analyzed jointly. Powerful kernel-based methods are commonly used in microarray analyses with support vector machines (SVMs) to approach a wide range of classification problems. However, the standard vectorial data kernel family (linear, RBF, etc.) that takes vectorial data as input, often fails in prediction if the data come from different platforms or laboratories, due to the low gene overlaps or consistencies between the different datasets. RESULTS: We introduce a new type of kernel called maximum entropy (ME) kernel, which has no pre-defined function but is generated by kernel entropy maximization with sample distance matrices as constraints, into the field of SVM classification of microarray data. We assessed the performance of the ME kernel with three different data: heterogeneous kidney carcinoma, noise-introduced leukemia, and heterogeneous oral cavity carcinoma metastasis data. The results clearly show that the ME kernel is very robust for heterogeneous data containing missing values and high-noise, and gives higher prediction accuracies than the standard kernels, namely, linear, polynomial and RBF. CONCLUSION: The results demonstrate its utility in effectively analyzing promiscuous microarray data of rare specimens, e.g., minor diseases or species, that present difficulty in compiling homogeneous data in a single laboratory. BioMed Central 2007-07-26 /pmc/articles/PMC1994960/ /pubmed/17651507 http://dx.doi.org/10.1186/1471-2105-8-267 Text en Copyright © 2007 Fujibuchi and Kato; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Fujibuchi, Wataru
Kato, Tsuyoshi
Classification of heterogeneous microarray data by maximum entropy kernel
title Classification of heterogeneous microarray data by maximum entropy kernel
title_full Classification of heterogeneous microarray data by maximum entropy kernel
title_fullStr Classification of heterogeneous microarray data by maximum entropy kernel
title_full_unstemmed Classification of heterogeneous microarray data by maximum entropy kernel
title_short Classification of heterogeneous microarray data by maximum entropy kernel
title_sort classification of heterogeneous microarray data by maximum entropy kernel
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1994960/
https://www.ncbi.nlm.nih.gov/pubmed/17651507
http://dx.doi.org/10.1186/1471-2105-8-267
work_keys_str_mv AT fujibuchiwataru classificationofheterogeneousmicroarraydatabymaximumentropykernel
AT katotsuyoshi classificationofheterogeneousmicroarraydatabymaximumentropykernel