Cargando…

Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions

Gene Expression is the process of determining the physical characteristics of living beings by generating the necessary proteins. Gene Expression takes place in two steps, translation and transcription. It is the flow of information from DNA to RNA with enzymes’ help, and the end product is proteins...

Descripción completa

Detalles Bibliográficos
Autores principales: Mahendran, Nivedhitha, Durai Raj Vincent, P. M., Srinivasan, Kathiravan, Chang, Chuan-Yu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7758324/
https://www.ncbi.nlm.nih.gov/pubmed/33362861
http://dx.doi.org/10.3389/fgene.2020.603808
_version_ 1783626917407096832
author Mahendran, Nivedhitha
Durai Raj Vincent, P. M.
Srinivasan, Kathiravan
Chang, Chuan-Yu
author_facet Mahendran, Nivedhitha
Durai Raj Vincent, P. M.
Srinivasan, Kathiravan
Chang, Chuan-Yu
author_sort Mahendran, Nivedhitha
collection PubMed
description Gene Expression is the process of determining the physical characteristics of living beings by generating the necessary proteins. Gene Expression takes place in two steps, translation and transcription. It is the flow of information from DNA to RNA with enzymes’ help, and the end product is proteins and other biochemical molecules. Many technologies can capture Gene Expression from the DNA or RNA. One such technique is Microarray DNA. Other than being expensive, the main issue with Microarray DNA is that it generates high-dimensional data with minimal sample size. The issue in handling such a heavyweight dataset is that the learning model will be over-fitted. This problem should be addressed by reducing the dimension of the data source to a considerable amount. In recent years, Machine Learning has gained popularity in the field of genomic studies. In the literature, many Machine Learning-based Gene Selection approaches have been discussed, which were proposed to improve dimensionality reduction precision. This paper does an extensive review of the various works done on Machine Learning-based gene selection in recent years, along with its performance analysis. The study categorizes various feature selection algorithms under Supervised, Unsupervised, and Semi-supervised learning. The works done in recent years to reduce the features for diagnosing tumors are discussed in detail. Furthermore, the performance of several discussed methods in the literature is analyzed. This study also lists out and briefly discusses the open issues in handling the high-dimension and less sample size data.
format Online
Article
Text
id pubmed-7758324
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-77583242020-12-25 Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions Mahendran, Nivedhitha Durai Raj Vincent, P. M. Srinivasan, Kathiravan Chang, Chuan-Yu Front Genet Genetics Gene Expression is the process of determining the physical characteristics of living beings by generating the necessary proteins. Gene Expression takes place in two steps, translation and transcription. It is the flow of information from DNA to RNA with enzymes’ help, and the end product is proteins and other biochemical molecules. Many technologies can capture Gene Expression from the DNA or RNA. One such technique is Microarray DNA. Other than being expensive, the main issue with Microarray DNA is that it generates high-dimensional data with minimal sample size. The issue in handling such a heavyweight dataset is that the learning model will be over-fitted. This problem should be addressed by reducing the dimension of the data source to a considerable amount. In recent years, Machine Learning has gained popularity in the field of genomic studies. In the literature, many Machine Learning-based Gene Selection approaches have been discussed, which were proposed to improve dimensionality reduction precision. This paper does an extensive review of the various works done on Machine Learning-based gene selection in recent years, along with its performance analysis. The study categorizes various feature selection algorithms under Supervised, Unsupervised, and Semi-supervised learning. The works done in recent years to reduce the features for diagnosing tumors are discussed in detail. Furthermore, the performance of several discussed methods in the literature is analyzed. This study also lists out and briefly discusses the open issues in handling the high-dimension and less sample size data. Frontiers Media S.A. 2020-12-10 /pmc/articles/PMC7758324/ /pubmed/33362861 http://dx.doi.org/10.3389/fgene.2020.603808 Text en Copyright © 2020 Mahendran, Durai Raj Vincent, Srinivasan and Chang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Mahendran, Nivedhitha
Durai Raj Vincent, P. M.
Srinivasan, Kathiravan
Chang, Chuan-Yu
Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions
title Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions
title_full Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions
title_fullStr Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions
title_full_unstemmed Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions
title_short Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions
title_sort machine learning based computational gene selection models: a survey, performance evaluation, open issues, and future research directions
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7758324/
https://www.ncbi.nlm.nih.gov/pubmed/33362861
http://dx.doi.org/10.3389/fgene.2020.603808
work_keys_str_mv AT mahendrannivedhitha machinelearningbasedcomputationalgeneselectionmodelsasurveyperformanceevaluationopenissuesandfutureresearchdirections
AT durairajvincentpm machinelearningbasedcomputationalgeneselectionmodelsasurveyperformanceevaluationopenissuesandfutureresearchdirections
AT srinivasankathiravan machinelearningbasedcomputationalgeneselectionmodelsasurveyperformanceevaluationopenissuesandfutureresearchdirections
AT changchuanyu machinelearningbasedcomputationalgeneselectionmodelsasurveyperformanceevaluationopenissuesandfutureresearchdirections