Cargando…

FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon

Single nucleotide variants (SNVs) affecting the first nucleotide G of an exon (Fex-SNVs) identified in various diseases are mostly recognized as missense or nonsense variants. Their effect on pre-mRNA splicing has been seldom analyzed, and no curated database is available. We previously reported tha...

Descripción completa

Detalles Bibliográficos
Autores principales: Joudaki, Atefeh, Takeda, Jun-ichi, Masuda, Akio, Ode, Rikumo, Fujiwara, Koichi, Ohno, Kinji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10531444/
https://www.ncbi.nlm.nih.gov/pubmed/37761905
http://dx.doi.org/10.3390/genes14091765
_version_ 1785111720084111360
author Joudaki, Atefeh
Takeda, Jun-ichi
Masuda, Akio
Ode, Rikumo
Fujiwara, Koichi
Ohno, Kinji
author_facet Joudaki, Atefeh
Takeda, Jun-ichi
Masuda, Akio
Ode, Rikumo
Fujiwara, Koichi
Ohno, Kinji
author_sort Joudaki, Atefeh
collection PubMed
description Single nucleotide variants (SNVs) affecting the first nucleotide G of an exon (Fex-SNVs) identified in various diseases are mostly recognized as missense or nonsense variants. Their effect on pre-mRNA splicing has been seldom analyzed, and no curated database is available. We previously reported that Fex-SNVs affect splicing when the length of the polypyrimidine tract is short or degenerate. However, we cannot readily predict the splicing effects of Fex-SNVs. We here scrutinized the available literature and identified 106 splicing-affecting Fex-SNVs based on experimental evidence. We similarly identified 106 neutral Fex-SNVs in the dbSNP database with a global minor allele frequency (MAF) of more than 0.01 and less than 0.50. We extracted 115 features representing the strength of splicing cis-elements and developed machine-learning models with support vector machine, random forest, and gradient boosting to discriminate splicing-affecting and neutral Fex-SNVs. Gradient boosting-based LightGBM outperformed the other two models, and the length and nucleotide compositions of the polypyrimidine tract played critical roles in the discrimination. Recursive feature elimination showed that the LightGBM model using 15 features achieved the best performance with an accuracy of 0.80 ± 0.12 (mean and SD), a Matthews Correlation Coefficient (MCC) of 0.57 ± 0.15, an area under the curve of the receiver operating characteristics curve (AUROC) of 0.86 ± 0.08, and an area under the curve of the precision–recall curve (AUPRC) of 0.87 ± 0.09 using a 10-fold cross-validation. We developed a web service program, named FexSplice that accepts a genomic coordinate either on GRCh37/hg19 or GRCh38/hg38 and returns a predicted probability of aberrant splicing of A, C, and T variants.
format Online
Article
Text
id pubmed-10531444
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-105314442023-09-28 FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon Joudaki, Atefeh Takeda, Jun-ichi Masuda, Akio Ode, Rikumo Fujiwara, Koichi Ohno, Kinji Genes (Basel) Article Single nucleotide variants (SNVs) affecting the first nucleotide G of an exon (Fex-SNVs) identified in various diseases are mostly recognized as missense or nonsense variants. Their effect on pre-mRNA splicing has been seldom analyzed, and no curated database is available. We previously reported that Fex-SNVs affect splicing when the length of the polypyrimidine tract is short or degenerate. However, we cannot readily predict the splicing effects of Fex-SNVs. We here scrutinized the available literature and identified 106 splicing-affecting Fex-SNVs based on experimental evidence. We similarly identified 106 neutral Fex-SNVs in the dbSNP database with a global minor allele frequency (MAF) of more than 0.01 and less than 0.50. We extracted 115 features representing the strength of splicing cis-elements and developed machine-learning models with support vector machine, random forest, and gradient boosting to discriminate splicing-affecting and neutral Fex-SNVs. Gradient boosting-based LightGBM outperformed the other two models, and the length and nucleotide compositions of the polypyrimidine tract played critical roles in the discrimination. Recursive feature elimination showed that the LightGBM model using 15 features achieved the best performance with an accuracy of 0.80 ± 0.12 (mean and SD), a Matthews Correlation Coefficient (MCC) of 0.57 ± 0.15, an area under the curve of the receiver operating characteristics curve (AUROC) of 0.86 ± 0.08, and an area under the curve of the precision–recall curve (AUPRC) of 0.87 ± 0.09 using a 10-fold cross-validation. We developed a web service program, named FexSplice that accepts a genomic coordinate either on GRCh37/hg19 or GRCh38/hg38 and returns a predicted probability of aberrant splicing of A, C, and T variants. MDPI 2023-09-06 /pmc/articles/PMC10531444/ /pubmed/37761905 http://dx.doi.org/10.3390/genes14091765 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Joudaki, Atefeh
Takeda, Jun-ichi
Masuda, Akio
Ode, Rikumo
Fujiwara, Koichi
Ohno, Kinji
FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon
title FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon
title_full FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon
title_fullStr FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon
title_full_unstemmed FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon
title_short FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon
title_sort fexsplice: a lightgbm-based model for predicting the splicing effect of a single nucleotide variant affecting the first nucleotide g of an exon
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10531444/
https://www.ncbi.nlm.nih.gov/pubmed/37761905
http://dx.doi.org/10.3390/genes14091765
work_keys_str_mv AT joudakiatefeh fexsplicealightgbmbasedmodelforpredictingthesplicingeffectofasinglenucleotidevariantaffectingthefirstnucleotidegofanexon
AT takedajunichi fexsplicealightgbmbasedmodelforpredictingthesplicingeffectofasinglenucleotidevariantaffectingthefirstnucleotidegofanexon
AT masudaakio fexsplicealightgbmbasedmodelforpredictingthesplicingeffectofasinglenucleotidevariantaffectingthefirstnucleotidegofanexon
AT oderikumo fexsplicealightgbmbasedmodelforpredictingthesplicingeffectofasinglenucleotidevariantaffectingthefirstnucleotidegofanexon
AT fujiwarakoichi fexsplicealightgbmbasedmodelforpredictingthesplicingeffectofasinglenucleotidevariantaffectingthefirstnucleotidegofanexon
AT ohnokinji fexsplicealightgbmbasedmodelforpredictingthesplicingeffectofasinglenucleotidevariantaffectingthefirstnucleotidegofanexon