Cargando…

Statistical Approach for Biologically Relevant Gene Selection from High-Throughput Gene Expression Data

Selection of biologically relevant genes from high-dimensional expression data is a key research problem in gene expression genomics. Most of the available gene selection methods are either based on relevancy or redundancy measure, which are usually adjudged through post selection classification acc...

Descripción completa

Detalles Bibliográficos
Autores principales: Das, Samarendra, Rai, Shesh N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7712650/
https://www.ncbi.nlm.nih.gov/pubmed/33286973
http://dx.doi.org/10.3390/e22111205
_version_ 1783618416920231936
author Das, Samarendra
Rai, Shesh N.
author_facet Das, Samarendra
Rai, Shesh N.
author_sort Das, Samarendra
collection PubMed
description Selection of biologically relevant genes from high-dimensional expression data is a key research problem in gene expression genomics. Most of the available gene selection methods are either based on relevancy or redundancy measure, which are usually adjudged through post selection classification accuracy. Through these methods the ranking of genes was conducted on a single high-dimensional expression data, which led to the selection of spuriously associated and redundant genes. Hence, we developed a statistical approach through combining a support vector machine with Maximum Relevance and Minimum Redundancy under a sound statistical setup for the selection of biologically relevant genes. Here, the genes were selected through statistical significance values and computed using a nonparametric test statistic under a bootstrap-based subject sampling model. Further, a systematic and rigorous evaluation of the proposed approach with nine existing competitive methods was carried on six different real crop gene expression datasets. This performance analysis was carried out under three comparison settings, i.e., subject classification, biological relevant criteria based on quantitative trait loci and gene ontology. Our analytical results showed that the proposed approach selects genes which are more biologically relevant as compared to the existing methods. Moreover, the proposed approach was also found to be better with respect to the competitive existing methods. The proposed statistical approach provides a framework for combining filter and wrapper methods of gene selection.
format Online
Article
Text
id pubmed-7712650
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-77126502021-02-24 Statistical Approach for Biologically Relevant Gene Selection from High-Throughput Gene Expression Data Das, Samarendra Rai, Shesh N. Entropy (Basel) Article Selection of biologically relevant genes from high-dimensional expression data is a key research problem in gene expression genomics. Most of the available gene selection methods are either based on relevancy or redundancy measure, which are usually adjudged through post selection classification accuracy. Through these methods the ranking of genes was conducted on a single high-dimensional expression data, which led to the selection of spuriously associated and redundant genes. Hence, we developed a statistical approach through combining a support vector machine with Maximum Relevance and Minimum Redundancy under a sound statistical setup for the selection of biologically relevant genes. Here, the genes were selected through statistical significance values and computed using a nonparametric test statistic under a bootstrap-based subject sampling model. Further, a systematic and rigorous evaluation of the proposed approach with nine existing competitive methods was carried on six different real crop gene expression datasets. This performance analysis was carried out under three comparison settings, i.e., subject classification, biological relevant criteria based on quantitative trait loci and gene ontology. Our analytical results showed that the proposed approach selects genes which are more biologically relevant as compared to the existing methods. Moreover, the proposed approach was also found to be better with respect to the competitive existing methods. The proposed statistical approach provides a framework for combining filter and wrapper methods of gene selection. MDPI 2020-10-25 /pmc/articles/PMC7712650/ /pubmed/33286973 http://dx.doi.org/10.3390/e22111205 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Das, Samarendra
Rai, Shesh N.
Statistical Approach for Biologically Relevant Gene Selection from High-Throughput Gene Expression Data
title Statistical Approach for Biologically Relevant Gene Selection from High-Throughput Gene Expression Data
title_full Statistical Approach for Biologically Relevant Gene Selection from High-Throughput Gene Expression Data
title_fullStr Statistical Approach for Biologically Relevant Gene Selection from High-Throughput Gene Expression Data
title_full_unstemmed Statistical Approach for Biologically Relevant Gene Selection from High-Throughput Gene Expression Data
title_short Statistical Approach for Biologically Relevant Gene Selection from High-Throughput Gene Expression Data
title_sort statistical approach for biologically relevant gene selection from high-throughput gene expression data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7712650/
https://www.ncbi.nlm.nih.gov/pubmed/33286973
http://dx.doi.org/10.3390/e22111205
work_keys_str_mv AT dassamarendra statisticalapproachforbiologicallyrelevantgeneselectionfromhighthroughputgeneexpressiondata
AT raisheshn statisticalapproachforbiologicallyrelevantgeneselectionfromhighthroughputgeneexpressiondata