Cargando…
Unsupervised gene selection using biological knowledge : application in sample clustering
BACKGROUND: Classification of biological samples of gene expression data is a basic building block in solving several problems in the field of bioinformatics like cancer and other disease diagnosis and making a proper treatment plan. One big challenge in sample classification is handling large dimen...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5700545/ https://www.ncbi.nlm.nih.gov/pubmed/29166852 http://dx.doi.org/10.1186/s12859-017-1933-0 |
_version_ | 1783281142892331008 |
---|---|
author | Acharya, Sudipta Saha, Sriparna Nikhil, N. |
author_facet | Acharya, Sudipta Saha, Sriparna Nikhil, N. |
author_sort | Acharya, Sudipta |
collection | PubMed |
description | BACKGROUND: Classification of biological samples of gene expression data is a basic building block in solving several problems in the field of bioinformatics like cancer and other disease diagnosis and making a proper treatment plan. One big challenge in sample classification is handling large dimensional and redundant gene expression data. To reduce the complexity of handling this high dimensional data, gene/feature selection plays a major role. RESULTS: The current paper explores the use of biological knowledge acquired from Gene Ontology database in selecting the proper subset of genes which can further participate in clustering of samples. The proposed feature selection technique is unsupervised in nature as it does not utilize any class label information in the process of gene selection. At the end, a multi-objective clustering approach is deployed to cluster the available set of samples in the reduced gene space. CONCLUSIONS: Reported results show that consideration of biological knowledge in gene selection technique not only reduces the feature space dimensionality in great extent but also improves the accuracy of sample classification. The obtained reduced gene space is validated using strong biological significance tests. In order to prove the supremacy of our proposed gene selection based sample clustering technique, a thorough comparative analysis has also been performed with state-of-the-art techniques. |
format | Online Article Text |
id | pubmed-5700545 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57005452017-12-01 Unsupervised gene selection using biological knowledge : application in sample clustering Acharya, Sudipta Saha, Sriparna Nikhil, N. BMC Bioinformatics Methodology Article BACKGROUND: Classification of biological samples of gene expression data is a basic building block in solving several problems in the field of bioinformatics like cancer and other disease diagnosis and making a proper treatment plan. One big challenge in sample classification is handling large dimensional and redundant gene expression data. To reduce the complexity of handling this high dimensional data, gene/feature selection plays a major role. RESULTS: The current paper explores the use of biological knowledge acquired from Gene Ontology database in selecting the proper subset of genes which can further participate in clustering of samples. The proposed feature selection technique is unsupervised in nature as it does not utilize any class label information in the process of gene selection. At the end, a multi-objective clustering approach is deployed to cluster the available set of samples in the reduced gene space. CONCLUSIONS: Reported results show that consideration of biological knowledge in gene selection technique not only reduces the feature space dimensionality in great extent but also improves the accuracy of sample classification. The obtained reduced gene space is validated using strong biological significance tests. In order to prove the supremacy of our proposed gene selection based sample clustering technique, a thorough comparative analysis has also been performed with state-of-the-art techniques. BioMed Central 2017-11-22 /pmc/articles/PMC5700545/ /pubmed/29166852 http://dx.doi.org/10.1186/s12859-017-1933-0 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Acharya, Sudipta Saha, Sriparna Nikhil, N. Unsupervised gene selection using biological knowledge : application in sample clustering |
title | Unsupervised gene selection using biological knowledge : application in sample clustering |
title_full | Unsupervised gene selection using biological knowledge : application in sample clustering |
title_fullStr | Unsupervised gene selection using biological knowledge : application in sample clustering |
title_full_unstemmed | Unsupervised gene selection using biological knowledge : application in sample clustering |
title_short | Unsupervised gene selection using biological knowledge : application in sample clustering |
title_sort | unsupervised gene selection using biological knowledge : application in sample clustering |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5700545/ https://www.ncbi.nlm.nih.gov/pubmed/29166852 http://dx.doi.org/10.1186/s12859-017-1933-0 |
work_keys_str_mv | AT acharyasudipta unsupervisedgeneselectionusingbiologicalknowledgeapplicationinsampleclustering AT sahasriparna unsupervisedgeneselectionusingbiologicalknowledgeapplicationinsampleclustering AT nikhiln unsupervisedgeneselectionusingbiologicalknowledgeapplicationinsampleclustering |