Cargando…

A consensus multi-view multi-objective gene selection approach for improved sample classification

BACKGROUND: In the field of computational biology, analyzing complex data helps to extract relevant biological information. Sample classification of gene expression data is one such popular bio-data analysis technique. However, the presence of a large number of irrelevant/redundant genes in expressi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Acharya, Sudipta, Cui, Laizhong, Pan, Yi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7495900/ https://www.ncbi.nlm.nih.gov/pubmed/32938388 http://dx.doi.org/10.1186/s12859-020-03681-5

_version_	1783582983399145472
author	Acharya, Sudipta Cui, Laizhong Pan, Yi
author_facet	Acharya, Sudipta Cui, Laizhong Pan, Yi
author_sort	Acharya, Sudipta
collection	PubMed
description	BACKGROUND: In the field of computational biology, analyzing complex data helps to extract relevant biological information. Sample classification of gene expression data is one such popular bio-data analysis technique. However, the presence of a large number of irrelevant/redundant genes in expression data makes a sample classification algorithm working inefficiently. Feature selection is one such high-dimensionality reduction technique that helps to maximize the effectiveness of any sample classification algorithm. Recent advances in biotechnology have improved the biological data to include multi-modal or multiple views. Different ‘omics’ resources capture various equally important biological properties of entities. However, most of the existing feature selection methodologies are biased towards considering only one out of multiple biological resources. Consequently, some crucial aspects of available biological knowledge may get ignored, which could further improve feature selection efficiency. RESULTS: In this present work, we have proposed a Consensus Multi-View Multi-objective Clustering-based feature selection algorithm called CMVMC. Three controlled genomic and proteomic resources like gene expression, Gene Ontology (GO), and protein-protein interaction network (PPIN) are utilized to build two independent views. The concept of multi-objective consensus clustering has been applied within our proposed gene selection method to satisfy both incorporated views. Gene expression data sets of Multiple tissues and Yeast from two different organisms (Homo Sapiens and Saccharomyces cerevisiae, respectively) are chosen for experimental purposes. As the end-product of CMVMC, a reduced set of relevant and non-redundant genes are found for each chosen data set. These genes finally participate in an effective sample classification. CONCLUSIONS: The experimental study on chosen data sets shows that our proposed feature-selection method improves the sample classification accuracy and reduces the gene-space up to a significant level. In the case of Multiple Tissues data set, CMVMC reduces the number of genes (features) from 5565 to 41, with 92.73% of sample classification accuracy. For Yeast data set, the number of genes got reduced to 10 from 2884, with 95.84% sample classification accuracy. Two internal cluster validity indices - Silhouette and Davies-Bouldin (DB) and one external validity index Classification Accuracy (CA) are chosen for comparative study. Reported results are further validated through well-known biological significance test and visualization tool.
format	Online Article Text
id	pubmed-7495900
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-74959002020-09-23 A consensus multi-view multi-objective gene selection approach for improved sample classification Acharya, Sudipta Cui, Laizhong Pan, Yi BMC Bioinformatics Methodology BACKGROUND: In the field of computational biology, analyzing complex data helps to extract relevant biological information. Sample classification of gene expression data is one such popular bio-data analysis technique. However, the presence of a large number of irrelevant/redundant genes in expression data makes a sample classification algorithm working inefficiently. Feature selection is one such high-dimensionality reduction technique that helps to maximize the effectiveness of any sample classification algorithm. Recent advances in biotechnology have improved the biological data to include multi-modal or multiple views. Different ‘omics’ resources capture various equally important biological properties of entities. However, most of the existing feature selection methodologies are biased towards considering only one out of multiple biological resources. Consequently, some crucial aspects of available biological knowledge may get ignored, which could further improve feature selection efficiency. RESULTS: In this present work, we have proposed a Consensus Multi-View Multi-objective Clustering-based feature selection algorithm called CMVMC. Three controlled genomic and proteomic resources like gene expression, Gene Ontology (GO), and protein-protein interaction network (PPIN) are utilized to build two independent views. The concept of multi-objective consensus clustering has been applied within our proposed gene selection method to satisfy both incorporated views. Gene expression data sets of Multiple tissues and Yeast from two different organisms (Homo Sapiens and Saccharomyces cerevisiae, respectively) are chosen for experimental purposes. As the end-product of CMVMC, a reduced set of relevant and non-redundant genes are found for each chosen data set. These genes finally participate in an effective sample classification. CONCLUSIONS: The experimental study on chosen data sets shows that our proposed feature-selection method improves the sample classification accuracy and reduces the gene-space up to a significant level. In the case of Multiple Tissues data set, CMVMC reduces the number of genes (features) from 5565 to 41, with 92.73% of sample classification accuracy. For Yeast data set, the number of genes got reduced to 10 from 2884, with 95.84% sample classification accuracy. Two internal cluster validity indices - Silhouette and Davies-Bouldin (DB) and one external validity index Classification Accuracy (CA) are chosen for comparative study. Reported results are further validated through well-known biological significance test and visualization tool. BioMed Central 2020-09-17 /pmc/articles/PMC7495900/ /pubmed/32938388 http://dx.doi.org/10.1186/s12859-020-03681-5 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Acharya, Sudipta Cui, Laizhong Pan, Yi A consensus multi-view multi-objective gene selection approach for improved sample classification
title	A consensus multi-view multi-objective gene selection approach for improved sample classification
title_full	A consensus multi-view multi-objective gene selection approach for improved sample classification
title_fullStr	A consensus multi-view multi-objective gene selection approach for improved sample classification
title_full_unstemmed	A consensus multi-view multi-objective gene selection approach for improved sample classification
title_short	A consensus multi-view multi-objective gene selection approach for improved sample classification
title_sort	consensus multi-view multi-objective gene selection approach for improved sample classification
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7495900/ https://www.ncbi.nlm.nih.gov/pubmed/32938388 http://dx.doi.org/10.1186/s12859-020-03681-5
work_keys_str_mv	AT acharyasudipta aconsensusmultiviewmultiobjectivegeneselectionapproachforimprovedsampleclassification AT cuilaizhong aconsensusmultiviewmultiobjectivegeneselectionapproachforimprovedsampleclassification AT panyi aconsensusmultiviewmultiobjectivegeneselectionapproachforimprovedsampleclassification AT acharyasudipta consensusmultiviewmultiobjectivegeneselectionapproachforimprovedsampleclassification AT cuilaizhong consensusmultiviewmultiobjectivegeneselectionapproachforimprovedsampleclassification AT panyi consensusmultiviewmultiobjectivegeneselectionapproachforimprovedsampleclassification

A consensus multi-view multi-objective gene selection approach for improved sample classification

Ejemplares similares