Cargando…

Unsupervised gene selection using biological knowledge : application in sample clustering

BACKGROUND: Classification of biological samples of gene expression data is a basic building block in solving several problems in the field of bioinformatics like cancer and other disease diagnosis and making a proper treatment plan. One big challenge in sample classification is handling large dimen...

Descripción completa

Detalles Bibliográficos
Autores principales: Acharya, Sudipta, Saha, Sriparna, Nikhil, N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5700545/
https://www.ncbi.nlm.nih.gov/pubmed/29166852
http://dx.doi.org/10.1186/s12859-017-1933-0
_version_ 1783281142892331008
author Acharya, Sudipta
Saha, Sriparna
Nikhil, N.
author_facet Acharya, Sudipta
Saha, Sriparna
Nikhil, N.
author_sort Acharya, Sudipta
collection PubMed
description BACKGROUND: Classification of biological samples of gene expression data is a basic building block in solving several problems in the field of bioinformatics like cancer and other disease diagnosis and making a proper treatment plan. One big challenge in sample classification is handling large dimensional and redundant gene expression data. To reduce the complexity of handling this high dimensional data, gene/feature selection plays a major role. RESULTS: The current paper explores the use of biological knowledge acquired from Gene Ontology database in selecting the proper subset of genes which can further participate in clustering of samples. The proposed feature selection technique is unsupervised in nature as it does not utilize any class label information in the process of gene selection. At the end, a multi-objective clustering approach is deployed to cluster the available set of samples in the reduced gene space. CONCLUSIONS: Reported results show that consideration of biological knowledge in gene selection technique not only reduces the feature space dimensionality in great extent but also improves the accuracy of sample classification. The obtained reduced gene space is validated using strong biological significance tests. In order to prove the supremacy of our proposed gene selection based sample clustering technique, a thorough comparative analysis has also been performed with state-of-the-art techniques.
format Online
Article
Text
id pubmed-5700545
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57005452017-12-01 Unsupervised gene selection using biological knowledge : application in sample clustering Acharya, Sudipta Saha, Sriparna Nikhil, N. BMC Bioinformatics Methodology Article BACKGROUND: Classification of biological samples of gene expression data is a basic building block in solving several problems in the field of bioinformatics like cancer and other disease diagnosis and making a proper treatment plan. One big challenge in sample classification is handling large dimensional and redundant gene expression data. To reduce the complexity of handling this high dimensional data, gene/feature selection plays a major role. RESULTS: The current paper explores the use of biological knowledge acquired from Gene Ontology database in selecting the proper subset of genes which can further participate in clustering of samples. The proposed feature selection technique is unsupervised in nature as it does not utilize any class label information in the process of gene selection. At the end, a multi-objective clustering approach is deployed to cluster the available set of samples in the reduced gene space. CONCLUSIONS: Reported results show that consideration of biological knowledge in gene selection technique not only reduces the feature space dimensionality in great extent but also improves the accuracy of sample classification. The obtained reduced gene space is validated using strong biological significance tests. In order to prove the supremacy of our proposed gene selection based sample clustering technique, a thorough comparative analysis has also been performed with state-of-the-art techniques. BioMed Central 2017-11-22 /pmc/articles/PMC5700545/ /pubmed/29166852 http://dx.doi.org/10.1186/s12859-017-1933-0 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Acharya, Sudipta
Saha, Sriparna
Nikhil, N.
Unsupervised gene selection using biological knowledge : application in sample clustering
title Unsupervised gene selection using biological knowledge : application in sample clustering
title_full Unsupervised gene selection using biological knowledge : application in sample clustering
title_fullStr Unsupervised gene selection using biological knowledge : application in sample clustering
title_full_unstemmed Unsupervised gene selection using biological knowledge : application in sample clustering
title_short Unsupervised gene selection using biological knowledge : application in sample clustering
title_sort unsupervised gene selection using biological knowledge : application in sample clustering
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5700545/
https://www.ncbi.nlm.nih.gov/pubmed/29166852
http://dx.doi.org/10.1186/s12859-017-1933-0
work_keys_str_mv AT acharyasudipta unsupervisedgeneselectionusingbiologicalknowledgeapplicationinsampleclustering
AT sahasriparna unsupervisedgeneselectionusingbiologicalknowledgeapplicationinsampleclustering
AT nikhiln unsupervisedgeneselectionusingbiologicalknowledgeapplicationinsampleclustering