Cargando…

An Algorithm for Finding Biologically Significant Features in Microarray Data Based on A Priori Manifold Learning

Microarray databases are a large source of genetic data, which, upon proper analysis, could enhance our understanding of biology and medicine. Many microarray experiments have been designed to investigate the genetic mechanisms of cancer, and analytical approaches have been applied in order to class...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hira, Zena M., Trigeorgis, George, Gillies, Duncan F.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3940899/ https://www.ncbi.nlm.nih.gov/pubmed/24595155 http://dx.doi.org/10.1371/journal.pone.0090562

_version_	1782305831398670336
author	Hira, Zena M. Trigeorgis, George Gillies, Duncan F.
author_facet	Hira, Zena M. Trigeorgis, George Gillies, Duncan F.
author_sort	Hira, Zena M.
collection	PubMed
description	Microarray databases are a large source of genetic data, which, upon proper analysis, could enhance our understanding of biology and medicine. Many microarray experiments have been designed to investigate the genetic mechanisms of cancer, and analytical approaches have been applied in order to classify different types of cancer or distinguish between cancerous and non-cancerous tissue. However, microarrays are high-dimensional datasets with high levels of noise and this causes problems when using machine learning methods. A popular approach to this problem is to search for a set of features that will simplify the structure and to some degree remove the noise from the data. The most widely used approach to feature extraction is principal component analysis (PCA) which assumes a multivariate Gaussian model of the data. More recently, non-linear methods have been investigated. Among these, manifold learning algorithms, for example Isomap, aim to project the data from a higher dimensional space onto a lower dimension one. We have proposed a priori manifold learning for finding a manifold in which a representative set of microarray data is fused with relevant data taken from the KEGG pathway database. Once the manifold has been constructed the raw microarray data is projected onto it and clustering and classification can take place. In contrast to earlier fusion based methods, the prior knowledge from the KEGG databases is not used in, and does not bias the classification process—it merely acts as an aid to find the best space in which to search the data. In our experiments we have found that using our new manifold method gives better classification results than using either PCA or conventional Isomap.
format	Online Article Text
id	pubmed-3940899
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-39408992014-03-06 An Algorithm for Finding Biologically Significant Features in Microarray Data Based on A Priori Manifold Learning Hira, Zena M. Trigeorgis, George Gillies, Duncan F. PLoS One Research Article Microarray databases are a large source of genetic data, which, upon proper analysis, could enhance our understanding of biology and medicine. Many microarray experiments have been designed to investigate the genetic mechanisms of cancer, and analytical approaches have been applied in order to classify different types of cancer or distinguish between cancerous and non-cancerous tissue. However, microarrays are high-dimensional datasets with high levels of noise and this causes problems when using machine learning methods. A popular approach to this problem is to search for a set of features that will simplify the structure and to some degree remove the noise from the data. The most widely used approach to feature extraction is principal component analysis (PCA) which assumes a multivariate Gaussian model of the data. More recently, non-linear methods have been investigated. Among these, manifold learning algorithms, for example Isomap, aim to project the data from a higher dimensional space onto a lower dimension one. We have proposed a priori manifold learning for finding a manifold in which a representative set of microarray data is fused with relevant data taken from the KEGG pathway database. Once the manifold has been constructed the raw microarray data is projected onto it and clustering and classification can take place. In contrast to earlier fusion based methods, the prior knowledge from the KEGG databases is not used in, and does not bias the classification process—it merely acts as an aid to find the best space in which to search the data. In our experiments we have found that using our new manifold method gives better classification results than using either PCA or conventional Isomap. Public Library of Science 2014-03-03 /pmc/articles/PMC3940899/ /pubmed/24595155 http://dx.doi.org/10.1371/journal.pone.0090562 Text en © 2014 Hira et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Hira, Zena M. Trigeorgis, George Gillies, Duncan F. An Algorithm for Finding Biologically Significant Features in Microarray Data Based on A Priori Manifold Learning
title	An Algorithm for Finding Biologically Significant Features in Microarray Data Based on A Priori Manifold Learning
title_full	An Algorithm for Finding Biologically Significant Features in Microarray Data Based on A Priori Manifold Learning
title_fullStr	An Algorithm for Finding Biologically Significant Features in Microarray Data Based on A Priori Manifold Learning
title_full_unstemmed	An Algorithm for Finding Biologically Significant Features in Microarray Data Based on A Priori Manifold Learning
title_short	An Algorithm for Finding Biologically Significant Features in Microarray Data Based on A Priori Manifold Learning
title_sort	algorithm for finding biologically significant features in microarray data based on a priori manifold learning
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3940899/ https://www.ncbi.nlm.nih.gov/pubmed/24595155 http://dx.doi.org/10.1371/journal.pone.0090562
work_keys_str_mv	AT hirazenam analgorithmforfindingbiologicallysignificantfeaturesinmicroarraydatabasedonapriorimanifoldlearning AT trigeorgisgeorge analgorithmforfindingbiologicallysignificantfeaturesinmicroarraydatabasedonapriorimanifoldlearning AT gilliesduncanf analgorithmforfindingbiologicallysignificantfeaturesinmicroarraydatabasedonapriorimanifoldlearning AT hirazenam algorithmforfindingbiologicallysignificantfeaturesinmicroarraydatabasedonapriorimanifoldlearning AT trigeorgisgeorge algorithmforfindingbiologicallysignificantfeaturesinmicroarraydatabasedonapriorimanifoldlearning AT gilliesduncanf algorithmforfindingbiologicallysignificantfeaturesinmicroarraydatabasedonapriorimanifoldlearning

An Algorithm for Finding Biologically Significant Features in Microarray Data Based on A Priori Manifold Learning

Ejemplares similares