Cargando…
FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon
Single nucleotide variants (SNVs) affecting the first nucleotide G of an exon (Fex-SNVs) identified in various diseases are mostly recognized as missense or nonsense variants. Their effect on pre-mRNA splicing has been seldom analyzed, and no curated database is available. We previously reported tha...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10531444/ https://www.ncbi.nlm.nih.gov/pubmed/37761905 http://dx.doi.org/10.3390/genes14091765 |
_version_ | 1785111720084111360 |
---|---|
author | Joudaki, Atefeh Takeda, Jun-ichi Masuda, Akio Ode, Rikumo Fujiwara, Koichi Ohno, Kinji |
author_facet | Joudaki, Atefeh Takeda, Jun-ichi Masuda, Akio Ode, Rikumo Fujiwara, Koichi Ohno, Kinji |
author_sort | Joudaki, Atefeh |
collection | PubMed |
description | Single nucleotide variants (SNVs) affecting the first nucleotide G of an exon (Fex-SNVs) identified in various diseases are mostly recognized as missense or nonsense variants. Their effect on pre-mRNA splicing has been seldom analyzed, and no curated database is available. We previously reported that Fex-SNVs affect splicing when the length of the polypyrimidine tract is short or degenerate. However, we cannot readily predict the splicing effects of Fex-SNVs. We here scrutinized the available literature and identified 106 splicing-affecting Fex-SNVs based on experimental evidence. We similarly identified 106 neutral Fex-SNVs in the dbSNP database with a global minor allele frequency (MAF) of more than 0.01 and less than 0.50. We extracted 115 features representing the strength of splicing cis-elements and developed machine-learning models with support vector machine, random forest, and gradient boosting to discriminate splicing-affecting and neutral Fex-SNVs. Gradient boosting-based LightGBM outperformed the other two models, and the length and nucleotide compositions of the polypyrimidine tract played critical roles in the discrimination. Recursive feature elimination showed that the LightGBM model using 15 features achieved the best performance with an accuracy of 0.80 ± 0.12 (mean and SD), a Matthews Correlation Coefficient (MCC) of 0.57 ± 0.15, an area under the curve of the receiver operating characteristics curve (AUROC) of 0.86 ± 0.08, and an area under the curve of the precision–recall curve (AUPRC) of 0.87 ± 0.09 using a 10-fold cross-validation. We developed a web service program, named FexSplice that accepts a genomic coordinate either on GRCh37/hg19 or GRCh38/hg38 and returns a predicted probability of aberrant splicing of A, C, and T variants. |
format | Online Article Text |
id | pubmed-10531444 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-105314442023-09-28 FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon Joudaki, Atefeh Takeda, Jun-ichi Masuda, Akio Ode, Rikumo Fujiwara, Koichi Ohno, Kinji Genes (Basel) Article Single nucleotide variants (SNVs) affecting the first nucleotide G of an exon (Fex-SNVs) identified in various diseases are mostly recognized as missense or nonsense variants. Their effect on pre-mRNA splicing has been seldom analyzed, and no curated database is available. We previously reported that Fex-SNVs affect splicing when the length of the polypyrimidine tract is short or degenerate. However, we cannot readily predict the splicing effects of Fex-SNVs. We here scrutinized the available literature and identified 106 splicing-affecting Fex-SNVs based on experimental evidence. We similarly identified 106 neutral Fex-SNVs in the dbSNP database with a global minor allele frequency (MAF) of more than 0.01 and less than 0.50. We extracted 115 features representing the strength of splicing cis-elements and developed machine-learning models with support vector machine, random forest, and gradient boosting to discriminate splicing-affecting and neutral Fex-SNVs. Gradient boosting-based LightGBM outperformed the other two models, and the length and nucleotide compositions of the polypyrimidine tract played critical roles in the discrimination. Recursive feature elimination showed that the LightGBM model using 15 features achieved the best performance with an accuracy of 0.80 ± 0.12 (mean and SD), a Matthews Correlation Coefficient (MCC) of 0.57 ± 0.15, an area under the curve of the receiver operating characteristics curve (AUROC) of 0.86 ± 0.08, and an area under the curve of the precision–recall curve (AUPRC) of 0.87 ± 0.09 using a 10-fold cross-validation. We developed a web service program, named FexSplice that accepts a genomic coordinate either on GRCh37/hg19 or GRCh38/hg38 and returns a predicted probability of aberrant splicing of A, C, and T variants. MDPI 2023-09-06 /pmc/articles/PMC10531444/ /pubmed/37761905 http://dx.doi.org/10.3390/genes14091765 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Joudaki, Atefeh Takeda, Jun-ichi Masuda, Akio Ode, Rikumo Fujiwara, Koichi Ohno, Kinji FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon |
title | FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon |
title_full | FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon |
title_fullStr | FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon |
title_full_unstemmed | FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon |
title_short | FexSplice: A LightGBM-Based Model for Predicting the Splicing Effect of a Single Nucleotide Variant Affecting the First Nucleotide G of an Exon |
title_sort | fexsplice: a lightgbm-based model for predicting the splicing effect of a single nucleotide variant affecting the first nucleotide g of an exon |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10531444/ https://www.ncbi.nlm.nih.gov/pubmed/37761905 http://dx.doi.org/10.3390/genes14091765 |
work_keys_str_mv | AT joudakiatefeh fexsplicealightgbmbasedmodelforpredictingthesplicingeffectofasinglenucleotidevariantaffectingthefirstnucleotidegofanexon AT takedajunichi fexsplicealightgbmbasedmodelforpredictingthesplicingeffectofasinglenucleotidevariantaffectingthefirstnucleotidegofanexon AT masudaakio fexsplicealightgbmbasedmodelforpredictingthesplicingeffectofasinglenucleotidevariantaffectingthefirstnucleotidegofanexon AT oderikumo fexsplicealightgbmbasedmodelforpredictingthesplicingeffectofasinglenucleotidevariantaffectingthefirstnucleotidegofanexon AT fujiwarakoichi fexsplicealightgbmbasedmodelforpredictingthesplicingeffectofasinglenucleotidevariantaffectingthefirstnucleotidegofanexon AT ohnokinji fexsplicealightgbmbasedmodelforpredictingthesplicingeffectofasinglenucleotidevariantaffectingthefirstnucleotidegofanexon |