Cargando…
STR-based feature extraction and selection for genetic feature discovery in neurological disease genes
Gene expression, often determined by single nucleotide polymorphisms, short repeated sequences known as short tandem repeats (STRs), structural variants, and environmental factors, provides means for an organism to produce gene products necessary to live. Variation in expression levels, sometimes kn...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9922266/ https://www.ncbi.nlm.nih.gov/pubmed/36774368 http://dx.doi.org/10.1038/s41598-023-29376-4 |
_version_ | 1784887506821447680 |
---|---|
author | Dhaliwal, Jasbir Wagner, John |
author_facet | Dhaliwal, Jasbir Wagner, John |
author_sort | Dhaliwal, Jasbir |
collection | PubMed |
description | Gene expression, often determined by single nucleotide polymorphisms, short repeated sequences known as short tandem repeats (STRs), structural variants, and environmental factors, provides means for an organism to produce gene products necessary to live. Variation in expression levels, sometimes known as enrichment patterns, has been associated with disease progression. Thus, the STR enrichment patterns have recently gained interest as potential genetic markers for disease progression. However, to the best of our knowledge, we are unaware of any study that evaluates and explores STRs, particularly trinucleotide sequences, as machine learning features for classifying neurological disease genes for the purpose of discovering genetic features. Thus, in this paper, we proposed a new metric and a novel feature extraction and selection algorithm based on statistically significant STR-based features and their respective enrichment patterns to create a statistically significant feature set. The proposed new metric has shown that the neurological disease family genes have a non-random AA, AT, TA, TG, and TT enrichment pattern. This is an important result, as it supports prior research that has established that certain trinucleotides, such as AAT, ATA, ATT, TAT, and TTA, are favored during protein misfolding. In contrast, trinucleotides, such as TAA, TAG, and TGA, are favored during premature termination codon mutations as they are stop codons. This suggests that the metric has the potential to identify patterns that may be genetic features in a sample of neurological genes. Moreover, the practical performance and high prediction results of the statistically significant STR-based feature set indicate that variations in STR enrichment patterns can distinguish neurological disease genes. In conclusion, the proposed approach may have the potential to discover differential genetic features for other diseases. |
format | Online Article Text |
id | pubmed-9922266 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-99222662023-02-13 STR-based feature extraction and selection for genetic feature discovery in neurological disease genes Dhaliwal, Jasbir Wagner, John Sci Rep Article Gene expression, often determined by single nucleotide polymorphisms, short repeated sequences known as short tandem repeats (STRs), structural variants, and environmental factors, provides means for an organism to produce gene products necessary to live. Variation in expression levels, sometimes known as enrichment patterns, has been associated with disease progression. Thus, the STR enrichment patterns have recently gained interest as potential genetic markers for disease progression. However, to the best of our knowledge, we are unaware of any study that evaluates and explores STRs, particularly trinucleotide sequences, as machine learning features for classifying neurological disease genes for the purpose of discovering genetic features. Thus, in this paper, we proposed a new metric and a novel feature extraction and selection algorithm based on statistically significant STR-based features and their respective enrichment patterns to create a statistically significant feature set. The proposed new metric has shown that the neurological disease family genes have a non-random AA, AT, TA, TG, and TT enrichment pattern. This is an important result, as it supports prior research that has established that certain trinucleotides, such as AAT, ATA, ATT, TAT, and TTA, are favored during protein misfolding. In contrast, trinucleotides, such as TAA, TAG, and TGA, are favored during premature termination codon mutations as they are stop codons. This suggests that the metric has the potential to identify patterns that may be genetic features in a sample of neurological genes. Moreover, the practical performance and high prediction results of the statistically significant STR-based feature set indicate that variations in STR enrichment patterns can distinguish neurological disease genes. In conclusion, the proposed approach may have the potential to discover differential genetic features for other diseases. Nature Publishing Group UK 2023-02-11 /pmc/articles/PMC9922266/ /pubmed/36774368 http://dx.doi.org/10.1038/s41598-023-29376-4 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Dhaliwal, Jasbir Wagner, John STR-based feature extraction and selection for genetic feature discovery in neurological disease genes |
title | STR-based feature extraction and selection for genetic feature discovery in neurological disease genes |
title_full | STR-based feature extraction and selection for genetic feature discovery in neurological disease genes |
title_fullStr | STR-based feature extraction and selection for genetic feature discovery in neurological disease genes |
title_full_unstemmed | STR-based feature extraction and selection for genetic feature discovery in neurological disease genes |
title_short | STR-based feature extraction and selection for genetic feature discovery in neurological disease genes |
title_sort | str-based feature extraction and selection for genetic feature discovery in neurological disease genes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9922266/ https://www.ncbi.nlm.nih.gov/pubmed/36774368 http://dx.doi.org/10.1038/s41598-023-29376-4 |
work_keys_str_mv | AT dhaliwaljasbir strbasedfeatureextractionandselectionforgeneticfeaturediscoveryinneurologicaldiseasegenes AT wagnerjohn strbasedfeatureextractionandselectionforgeneticfeaturediscoveryinneurologicaldiseasegenes |