Cargando…

Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classification

Introduction: Essential genes are essential for the survival of various species. These genes are a family linked to critical cellular activities for species survival. These genes are coded for proteins that regulate central metabolism, gene translation, deoxyribonucleic acid replication, and fundame...

Descripción completa

Detalles Bibliográficos
Autores principales: Rout, Ranjeet Kumar, Umer, Saiyed, Khandelwal, Monika, Pati, Smitarani, Mallik, Saurav, Balabantaray, Bunil Kumar, Qin, Hong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10156977/
https://www.ncbi.nlm.nih.gov/pubmed/37152988
http://dx.doi.org/10.3389/fgene.2023.1154120
_version_ 1785036647213039616
author Rout, Ranjeet Kumar
Umer, Saiyed
Khandelwal, Monika
Pati, Smitarani
Mallik, Saurav
Balabantaray, Bunil Kumar
Qin, Hong
author_facet Rout, Ranjeet Kumar
Umer, Saiyed
Khandelwal, Monika
Pati, Smitarani
Mallik, Saurav
Balabantaray, Bunil Kumar
Qin, Hong
author_sort Rout, Ranjeet Kumar
collection PubMed
description Introduction: Essential genes are essential for the survival of various species. These genes are a family linked to critical cellular activities for species survival. These genes are coded for proteins that regulate central metabolism, gene translation, deoxyribonucleic acid replication, and fundamental cellular structure and facilitate intracellular and extracellular transport. Essential genes preserve crucial genomics information that may hold the key to a detailed knowledge of life and evolution. Essential gene studies have long been regarded as a vital topic in computational biology due to their relevance. An essential gene is composed of adenine, guanine, cytosine, and thymine and its various combinations. Methods: This paper presents a novel method of extracting information on the stationary patterns of nucleotides such as adenine, guanine, cytosine, and thymine in each gene. For this purpose, some co-occurrence matrices are derived that provide the statistical distribution of stationary patterns of nucleotides in the genes, which is helpful in establishing the relationship between the nucleotides. For extracting discriminant features from each co-occurrence matrix, energy, entropy, homogeneity, contrast, and dissimilarity features are computed, which are extracted from all co-occurrence matrices and then concatenated to form a feature vector representing each essential gene. Finally, supervised machine learning algorithms are applied for essential gene classification based on the extracted fixed-dimensional feature vectors. Results: For comparison, some existing state-of-the-art feature representation techniques such as Shannon entropy (SE), Hurst exponent (HE), fractal dimension (FD), and their combinations have been utilized. Discussion: An extensive experiment has been performed for classifying the essential genes of five species that show the robustness and effectiveness of the proposed methodology.
format Online
Article
Text
id pubmed-10156977
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-101569772023-05-05 Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classification Rout, Ranjeet Kumar Umer, Saiyed Khandelwal, Monika Pati, Smitarani Mallik, Saurav Balabantaray, Bunil Kumar Qin, Hong Front Genet Genetics Introduction: Essential genes are essential for the survival of various species. These genes are a family linked to critical cellular activities for species survival. These genes are coded for proteins that regulate central metabolism, gene translation, deoxyribonucleic acid replication, and fundamental cellular structure and facilitate intracellular and extracellular transport. Essential genes preserve crucial genomics information that may hold the key to a detailed knowledge of life and evolution. Essential gene studies have long been regarded as a vital topic in computational biology due to their relevance. An essential gene is composed of adenine, guanine, cytosine, and thymine and its various combinations. Methods: This paper presents a novel method of extracting information on the stationary patterns of nucleotides such as adenine, guanine, cytosine, and thymine in each gene. For this purpose, some co-occurrence matrices are derived that provide the statistical distribution of stationary patterns of nucleotides in the genes, which is helpful in establishing the relationship between the nucleotides. For extracting discriminant features from each co-occurrence matrix, energy, entropy, homogeneity, contrast, and dissimilarity features are computed, which are extracted from all co-occurrence matrices and then concatenated to form a feature vector representing each essential gene. Finally, supervised machine learning algorithms are applied for essential gene classification based on the extracted fixed-dimensional feature vectors. Results: For comparison, some existing state-of-the-art feature representation techniques such as Shannon entropy (SE), Hurst exponent (HE), fractal dimension (FD), and their combinations have been utilized. Discussion: An extensive experiment has been performed for classifying the essential genes of five species that show the robustness and effectiveness of the proposed methodology. Frontiers Media S.A. 2023-04-20 /pmc/articles/PMC10156977/ /pubmed/37152988 http://dx.doi.org/10.3389/fgene.2023.1154120 Text en Copyright © 2023 Rout, Umer, Khandelwal, Pati, Mallik, Balabantaray and Qin. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Rout, Ranjeet Kumar
Umer, Saiyed
Khandelwal, Monika
Pati, Smitarani
Mallik, Saurav
Balabantaray, Bunil Kumar
Qin, Hong
Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classification
title Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classification
title_full Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classification
title_fullStr Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classification
title_full_unstemmed Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classification
title_short Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classification
title_sort identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classification
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10156977/
https://www.ncbi.nlm.nih.gov/pubmed/37152988
http://dx.doi.org/10.3389/fgene.2023.1154120
work_keys_str_mv AT routranjeetkumar identificationofdiscriminantfeaturesfromstationarypatternofnucleotidebasesandtheirapplicationtoessentialgeneclassification
AT umersaiyed identificationofdiscriminantfeaturesfromstationarypatternofnucleotidebasesandtheirapplicationtoessentialgeneclassification
AT khandelwalmonika identificationofdiscriminantfeaturesfromstationarypatternofnucleotidebasesandtheirapplicationtoessentialgeneclassification
AT patismitarani identificationofdiscriminantfeaturesfromstationarypatternofnucleotidebasesandtheirapplicationtoessentialgeneclassification
AT malliksaurav identificationofdiscriminantfeaturesfromstationarypatternofnucleotidebasesandtheirapplicationtoessentialgeneclassification
AT balabantaraybunilkumar identificationofdiscriminantfeaturesfromstationarypatternofnucleotidebasesandtheirapplicationtoessentialgeneclassification
AT qinhong identificationofdiscriminantfeaturesfromstationarypatternofnucleotidebasesandtheirapplicationtoessentialgeneclassification