Cargando…

Benchmarking splice variant prediction algorithms using massively parallel splicing assays

BACKGROUND: Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compou...

Descripción completa

Detalles Bibliográficos
Autores principales: Smith, Cathy, Kitzman, Jacob O.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10187268/
https://www.ncbi.nlm.nih.gov/pubmed/37205456
http://dx.doi.org/10.1101/2023.05.04.539398
_version_ 1785042711409065984
author Smith, Cathy
Kitzman, Jacob O.
author_facet Smith, Cathy
Kitzman, Jacob O.
author_sort Smith, Cathy
collection PubMed
description BACKGROUND: Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. RESULTS: We benchmarked eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compared experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms’ concordance with MPSA measurements, and with each other, was lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieved the best overall performance at distinguishing disruptive and neutral variants. Controlling for overall call rate genome-wide, SpliceAI and Pangolin also showed superior overall sensitivity for identifying SDVs. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. CONCLUSION: SpliceAI and Pangolin showed the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
format Online
Article
Text
id pubmed-10187268
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-101872682023-05-17 Benchmarking splice variant prediction algorithms using massively parallel splicing assays Smith, Cathy Kitzman, Jacob O. bioRxiv Article BACKGROUND: Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. RESULTS: We benchmarked eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compared experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms’ concordance with MPSA measurements, and with each other, was lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieved the best overall performance at distinguishing disruptive and neutral variants. Controlling for overall call rate genome-wide, SpliceAI and Pangolin also showed superior overall sensitivity for identifying SDVs. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. CONCLUSION: SpliceAI and Pangolin showed the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons. Cold Spring Harbor Laboratory 2023-05-07 /pmc/articles/PMC10187268/ /pubmed/37205456 http://dx.doi.org/10.1101/2023.05.04.539398 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Smith, Cathy
Kitzman, Jacob O.
Benchmarking splice variant prediction algorithms using massively parallel splicing assays
title Benchmarking splice variant prediction algorithms using massively parallel splicing assays
title_full Benchmarking splice variant prediction algorithms using massively parallel splicing assays
title_fullStr Benchmarking splice variant prediction algorithms using massively parallel splicing assays
title_full_unstemmed Benchmarking splice variant prediction algorithms using massively parallel splicing assays
title_short Benchmarking splice variant prediction algorithms using massively parallel splicing assays
title_sort benchmarking splice variant prediction algorithms using massively parallel splicing assays
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10187268/
https://www.ncbi.nlm.nih.gov/pubmed/37205456
http://dx.doi.org/10.1101/2023.05.04.539398
work_keys_str_mv AT smithcathy benchmarkingsplicevariantpredictionalgorithmsusingmassivelyparallelsplicingassays
AT kitzmanjacobo benchmarkingsplicevariantpredictionalgorithmsusingmassivelyparallelsplicingassays