Cargando…
Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree
Identifying genes related to Parkinson’s disease (PD) is an active research topic in biomedical analysis, which plays a critical role in diagnosis and treatment. Recently, many studies have proposed different techniques for predicting disease-related genes. However, a few of these techniques are des...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200794/ https://www.ncbi.nlm.nih.gov/pubmed/35705654 http://dx.doi.org/10.1038/s41598-022-14127-8 |
_version_ | 1784728144011329536 |
---|---|
author | Helmy, Marwa Eldaydamony, Eman Mekky, Nagham Elmogy, Mohammed Soliman, Hassan |
author_facet | Helmy, Marwa Eldaydamony, Eman Mekky, Nagham Elmogy, Mohammed Soliman, Hassan |
author_sort | Helmy, Marwa |
collection | PubMed |
description | Identifying genes related to Parkinson’s disease (PD) is an active research topic in biomedical analysis, which plays a critical role in diagnosis and treatment. Recently, many studies have proposed different techniques for predicting disease-related genes. However, a few of these techniques are designed or developed for PD gene prediction. Most of these PD techniques are developed to identify only protein genes and discard long noncoding (lncRNA) genes, which play an essential role in biological processes and the transformation and development of diseases. This paper proposes a novel prediction system to identify protein and lncRNA genes related to PD that can aid in an early diagnosis. First, we preprocessed the genes into DNA FASTA sequences from the University of California Santa Cruz (UCSC) genome browser and removed the redundancies. Second, we extracted some significant features of DNA FASTA sequences using the PyFeat method with the AdaBoost as feature selection. These selected features achieved promising results compared with extracted features from some state-of-the-art feature extraction techniques. Finally, the features were fed to the gradient-boosted decision tree (GBDT) to diagnose different tested cases. Seven performance metrics were used to evaluate the performance of the proposed system. The proposed system achieved an average accuracy of 78.6%, the area under the curve equals 84.5%, the area under precision-recall (AUPR) equals 85.3%, F1-score equals 78.3%, Matthews correlation coefficient (MCC) equals 0.575, sensitivity (SEN) equals 77.1%, and specificity (SPC) equals 80.2%. The experiments demonstrate promising results compared with other systems. The predicted top-rank protein and lncRNA genes are verified based on a literature review. |
format | Online Article Text |
id | pubmed-9200794 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-92007942022-06-17 Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree Helmy, Marwa Eldaydamony, Eman Mekky, Nagham Elmogy, Mohammed Soliman, Hassan Sci Rep Article Identifying genes related to Parkinson’s disease (PD) is an active research topic in biomedical analysis, which plays a critical role in diagnosis and treatment. Recently, many studies have proposed different techniques for predicting disease-related genes. However, a few of these techniques are designed or developed for PD gene prediction. Most of these PD techniques are developed to identify only protein genes and discard long noncoding (lncRNA) genes, which play an essential role in biological processes and the transformation and development of diseases. This paper proposes a novel prediction system to identify protein and lncRNA genes related to PD that can aid in an early diagnosis. First, we preprocessed the genes into DNA FASTA sequences from the University of California Santa Cruz (UCSC) genome browser and removed the redundancies. Second, we extracted some significant features of DNA FASTA sequences using the PyFeat method with the AdaBoost as feature selection. These selected features achieved promising results compared with extracted features from some state-of-the-art feature extraction techniques. Finally, the features were fed to the gradient-boosted decision tree (GBDT) to diagnose different tested cases. Seven performance metrics were used to evaluate the performance of the proposed system. The proposed system achieved an average accuracy of 78.6%, the area under the curve equals 84.5%, the area under precision-recall (AUPR) equals 85.3%, F1-score equals 78.3%, Matthews correlation coefficient (MCC) equals 0.575, sensitivity (SEN) equals 77.1%, and specificity (SPC) equals 80.2%. The experiments demonstrate promising results compared with other systems. The predicted top-rank protein and lncRNA genes are verified based on a literature review. Nature Publishing Group UK 2022-06-15 /pmc/articles/PMC9200794/ /pubmed/35705654 http://dx.doi.org/10.1038/s41598-022-14127-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Helmy, Marwa Eldaydamony, Eman Mekky, Nagham Elmogy, Mohammed Soliman, Hassan Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree |
title | Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree |
title_full | Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree |
title_fullStr | Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree |
title_full_unstemmed | Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree |
title_short | Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree |
title_sort | predicting parkinson disease related genes based on pyfeat and gradient boosted decision tree |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9200794/ https://www.ncbi.nlm.nih.gov/pubmed/35705654 http://dx.doi.org/10.1038/s41598-022-14127-8 |
work_keys_str_mv | AT helmymarwa predictingparkinsondiseaserelatedgenesbasedonpyfeatandgradientboosteddecisiontree AT eldaydamonyeman predictingparkinsondiseaserelatedgenesbasedonpyfeatandgradientboosteddecisiontree AT mekkynagham predictingparkinsondiseaserelatedgenesbasedonpyfeatandgradientboosteddecisiontree AT elmogymohammed predictingparkinsondiseaserelatedgenesbasedonpyfeatandgradientboosteddecisiontree AT solimanhassan predictingparkinsondiseaserelatedgenesbasedonpyfeatandgradientboosteddecisiontree |