An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants

Single-nucleotide variants (SNVs) are a major form of genetic variation in the human genome that contribute to various disorders. There are two types of SNVs, namely non-synonymous (missense) variants (nsSNVs) and synonymous variants (sSNVs), predominantly involved in RNA processing or gene regulati...

Descripción completa

Detalles Bibliográficos
Autores principales: Ranganathan Ganakammal, Satishkumar, Alexov, Emil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7565489/
https://www.ncbi.nlm.nih.gov/pubmed/32967157
http://dx.doi.org/10.3390/genes11091102
_version_ 1783595944019755008
author Ranganathan Ganakammal, Satishkumar
Alexov, Emil
author_facet Ranganathan Ganakammal, Satishkumar
Alexov, Emil
author_sort Ranganathan Ganakammal, Satishkumar
collection PubMed
description Single-nucleotide variants (SNVs) are a major form of genetic variation in the human genome that contribute to various disorders. There are two types of SNVs, namely non-synonymous (missense) variants (nsSNVs) and synonymous variants (sSNVs), predominantly involved in RNA processing or gene regulation. sSNVs, unlike missense or nsSNVs, do not alter the amino acid sequences, thereby making challenging candidates for downstream functional studies. Numerous computational methods have been developed to evaluate the clinical impact of nsSNVs, but very few methods are available for understanding the effects of sSNVs. For this analysis, we have downloaded sSNVs from the ClinVar database with various features such as conservation, DNA-RNA, and splicing properties. We performed feature selection and implemented an ensemble random forest (RF) classification algorithm to build a classifier to predict the pathogenicity of the sSNVs. We demonstrate that the ensemble predictor with selected features (20 features) enhances the classification of sSNVs into two categories, pathogenic and benign, with high accuracy (87%), precision (79%), and recall (91%). Furthermore, we used this prediction model to reclassify sSNVs with unknown clinical significance. Finally, the method is very robust and can be used to predict the effect of other unknown sSNVs.
format Online
Article
Text
id pubmed-7565489
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75654892020-10-26 An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants Ranganathan Ganakammal, Satishkumar Alexov, Emil Genes (Basel) Article Single-nucleotide variants (SNVs) are a major form of genetic variation in the human genome that contribute to various disorders. There are two types of SNVs, namely non-synonymous (missense) variants (nsSNVs) and synonymous variants (sSNVs), predominantly involved in RNA processing or gene regulation. sSNVs, unlike missense or nsSNVs, do not alter the amino acid sequences, thereby making challenging candidates for downstream functional studies. Numerous computational methods have been developed to evaluate the clinical impact of nsSNVs, but very few methods are available for understanding the effects of sSNVs. For this analysis, we have downloaded sSNVs from the ClinVar database with various features such as conservation, DNA-RNA, and splicing properties. We performed feature selection and implemented an ensemble random forest (RF) classification algorithm to build a classifier to predict the pathogenicity of the sSNVs. We demonstrate that the ensemble predictor with selected features (20 features) enhances the classification of sSNVs into two categories, pathogenic and benign, with high accuracy (87%), precision (79%), and recall (91%). Furthermore, we used this prediction model to reclassify sSNVs with unknown clinical significance. Finally, the method is very robust and can be used to predict the effect of other unknown sSNVs. MDPI 2020-09-21 /pmc/articles/PMC7565489/ /pubmed/32967157 http://dx.doi.org/10.3390/genes11091102 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ranganathan Ganakammal, Satishkumar
Alexov, Emil
An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants
title An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants
title_full An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants
title_fullStr An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants
title_full_unstemmed An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants
title_short An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants
title_sort ensemble approach to predict the pathogenicity of synonymous variants
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7565489/
https://www.ncbi.nlm.nih.gov/pubmed/32967157
http://dx.doi.org/10.3390/genes11091102
work_keys_str_mv AT ranganathanganakammalsatishkumar anensembleapproachtopredictthepathogenicityofsynonymousvariants
AT alexovemil anensembleapproachtopredictthepathogenicityofsynonymousvariants
AT ranganathanganakammalsatishkumar ensembleapproachtopredictthepathogenicityofsynonymousvariants
AT alexovemil ensembleapproachtopredictthepathogenicityofsynonymousvariants