An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants
Single-nucleotide variants (SNVs) are a major form of genetic variation in the human genome that contribute to various disorders. There are two types of SNVs, namely non-synonymous (missense) variants (nsSNVs) and synonymous variants (sSNVs), predominantly involved in RNA processing or gene regulati...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7565489/ https://www.ncbi.nlm.nih.gov/pubmed/32967157 http://dx.doi.org/10.3390/genes11091102 |
_version_ | 1783595944019755008 |
---|---|
author | Ranganathan Ganakammal, Satishkumar Alexov, Emil |
author_facet | Ranganathan Ganakammal, Satishkumar Alexov, Emil |
author_sort | Ranganathan Ganakammal, Satishkumar |
collection | PubMed |
description | Single-nucleotide variants (SNVs) are a major form of genetic variation in the human genome that contribute to various disorders. There are two types of SNVs, namely non-synonymous (missense) variants (nsSNVs) and synonymous variants (sSNVs), predominantly involved in RNA processing or gene regulation. sSNVs, unlike missense or nsSNVs, do not alter the amino acid sequences, thereby making challenging candidates for downstream functional studies. Numerous computational methods have been developed to evaluate the clinical impact of nsSNVs, but very few methods are available for understanding the effects of sSNVs. For this analysis, we have downloaded sSNVs from the ClinVar database with various features such as conservation, DNA-RNA, and splicing properties. We performed feature selection and implemented an ensemble random forest (RF) classification algorithm to build a classifier to predict the pathogenicity of the sSNVs. We demonstrate that the ensemble predictor with selected features (20 features) enhances the classification of sSNVs into two categories, pathogenic and benign, with high accuracy (87%), precision (79%), and recall (91%). Furthermore, we used this prediction model to reclassify sSNVs with unknown clinical significance. Finally, the method is very robust and can be used to predict the effect of other unknown sSNVs. |
format | Online Article Text |
id | pubmed-7565489 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-75654892020-10-26 An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants Ranganathan Ganakammal, Satishkumar Alexov, Emil Genes (Basel) Article Single-nucleotide variants (SNVs) are a major form of genetic variation in the human genome that contribute to various disorders. There are two types of SNVs, namely non-synonymous (missense) variants (nsSNVs) and synonymous variants (sSNVs), predominantly involved in RNA processing or gene regulation. sSNVs, unlike missense or nsSNVs, do not alter the amino acid sequences, thereby making challenging candidates for downstream functional studies. Numerous computational methods have been developed to evaluate the clinical impact of nsSNVs, but very few methods are available for understanding the effects of sSNVs. For this analysis, we have downloaded sSNVs from the ClinVar database with various features such as conservation, DNA-RNA, and splicing properties. We performed feature selection and implemented an ensemble random forest (RF) classification algorithm to build a classifier to predict the pathogenicity of the sSNVs. We demonstrate that the ensemble predictor with selected features (20 features) enhances the classification of sSNVs into two categories, pathogenic and benign, with high accuracy (87%), precision (79%), and recall (91%). Furthermore, we used this prediction model to reclassify sSNVs with unknown clinical significance. Finally, the method is very robust and can be used to predict the effect of other unknown sSNVs. MDPI 2020-09-21 /pmc/articles/PMC7565489/ /pubmed/32967157 http://dx.doi.org/10.3390/genes11091102 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Ranganathan Ganakammal, Satishkumar Alexov, Emil An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants |
title | An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants |
title_full | An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants |
title_fullStr | An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants |
title_full_unstemmed | An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants |
title_short | An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants |
title_sort | ensemble approach to predict the pathogenicity of synonymous variants |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7565489/ https://www.ncbi.nlm.nih.gov/pubmed/32967157 http://dx.doi.org/10.3390/genes11091102 |
work_keys_str_mv | AT ranganathanganakammalsatishkumar anensembleapproachtopredictthepathogenicityofsynonymousvariants AT alexovemil anensembleapproachtopredictthepathogenicityofsynonymousvariants AT ranganathanganakammalsatishkumar ensembleapproachtopredictthepathogenicityofsynonymousvariants AT alexovemil ensembleapproachtopredictthepathogenicityofsynonymousvariants |