Cargando…

Multi-view feature selection for identifying gene markers: a diversified biological data driven approach

BACKGROUND: In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dim...

Descripción completa

Detalles Bibliográficos
Autores principales:	Acharya, Sudipta, Cui, Laizhong, Pan, Yi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7772934/ https://www.ncbi.nlm.nih.gov/pubmed/33375940 http://dx.doi.org/10.1186/s12859-020-03810-0

_version_	1783629967222898688
author	Acharya, Sudipta Cui, Laizhong Pan, Yi
author_facet	Acharya, Sudipta Cui, Laizhong Pan, Yi
author_sort	Acharya, Sudipta
collection	PubMed
description	BACKGROUND: In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. RESULTS: In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. CONCLUSION: A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.
format	Online Article Text
id	pubmed-7772934
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-77729342020-12-30 Multi-view feature selection for identifying gene markers: a diversified biological data driven approach Acharya, Sudipta Cui, Laizhong Pan, Yi BMC Bioinformatics Methodology BACKGROUND: In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. RESULTS: In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. CONCLUSION: A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting. BioMed Central 2020-12-30 /pmc/articles/PMC7772934/ /pubmed/33375940 http://dx.doi.org/10.1186/s12859-020-03810-0 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Acharya, Sudipta Cui, Laizhong Pan, Yi Multi-view feature selection for identifying gene markers: a diversified biological data driven approach
title	Multi-view feature selection for identifying gene markers: a diversified biological data driven approach
title_full	Multi-view feature selection for identifying gene markers: a diversified biological data driven approach
title_fullStr	Multi-view feature selection for identifying gene markers: a diversified biological data driven approach
title_full_unstemmed	Multi-view feature selection for identifying gene markers: a diversified biological data driven approach
title_short	Multi-view feature selection for identifying gene markers: a diversified biological data driven approach
title_sort	multi-view feature selection for identifying gene markers: a diversified biological data driven approach
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7772934/ https://www.ncbi.nlm.nih.gov/pubmed/33375940 http://dx.doi.org/10.1186/s12859-020-03810-0
work_keys_str_mv	AT acharyasudipta multiviewfeatureselectionforidentifyinggenemarkersadiversifiedbiologicaldatadrivenapproach AT cuilaizhong multiviewfeatureselectionforidentifyinggenemarkersadiversifiedbiologicaldatadrivenapproach AT panyi multiviewfeatureselectionforidentifyinggenemarkersadiversifiedbiologicaldatadrivenapproach

Multi-view feature selection for identifying gene markers: a diversified biological data driven approach

Ejemplares similares