Cargando…
A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction
Introduction: Variants in 5′ and 3′ untranslated regions (UTR) contribute to rare disease. While predictive algorithms to assist in classifying pathogenicity can potentially be highly valuable, the utility of these tools is often unclear, as it depends on carefully selected training and validation c...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10517338/ https://www.ncbi.nlm.nih.gov/pubmed/37745687 http://dx.doi.org/10.3389/fmolb.2023.1257550 |
_version_ | 1785109298256281600 |
---|---|
author | Bohn, Emma Lau, Tammy T. Y. Wagih, Omar Masud, Tehmina Merico, Daniele |
author_facet | Bohn, Emma Lau, Tammy T. Y. Wagih, Omar Masud, Tehmina Merico, Daniele |
author_sort | Bohn, Emma |
collection | PubMed |
description | Introduction: Variants in 5′ and 3′ untranslated regions (UTR) contribute to rare disease. While predictive algorithms to assist in classifying pathogenicity can potentially be highly valuable, the utility of these tools is often unclear, as it depends on carefully selected training and validation conditions. To address this, we developed a high confidence set of pathogenic (P) and likely pathogenic (LP) variants and assessed deep learning (DL) models for predicting their molecular effects. Methods: 3′ and 5′ UTR variants documented as P or LP (P/LP) were obtained from ClinVar and refined by reviewing the annotated variant effect and reassessing evidence of pathogenicity following published guidelines. Prediction scores from sequence-based DL models were compared between three groups: P/LP variants acting though the mechanism for which the model was designed (model-matched), those operating through other mechanisms (model-mismatched), and putative benign variants. PhyloP was used to compare conservation scores between P/LP and putative benign variants. Results: 295 3′ and 188 5′ UTR variants were obtained from ClinVar, of which 26 3′ and 68 5′ UTR variants were classified as P/LP. Predictions by DL models achieved statistically significant differences when comparing modelmatched P/LP variants to both putative benign variants and modelmismatched P/LP variants, as well as when comparing all P/LP variants to putative benign variants. PhyloP conservation scores were significantly higher among P/LP compared to putative benign variants for both the 3′ and 5′ UTR. Discussion: In conclusion, we present a high-confidence set of P/LP 3′ and 5′ UTR variants spanning a range of mechanisms and supported by detailed pathogenicity and molecular mechanism evidence curation. Predictions from DL models further substantiate these classifications. These datasets will support further development and validation of DL algorithms designed to predict the functional impact of variants that may be implicated in rare disease. |
format | Online Article Text |
id | pubmed-10517338 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-105173382023-09-24 A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction Bohn, Emma Lau, Tammy T. Y. Wagih, Omar Masud, Tehmina Merico, Daniele Front Mol Biosci Molecular Biosciences Introduction: Variants in 5′ and 3′ untranslated regions (UTR) contribute to rare disease. While predictive algorithms to assist in classifying pathogenicity can potentially be highly valuable, the utility of these tools is often unclear, as it depends on carefully selected training and validation conditions. To address this, we developed a high confidence set of pathogenic (P) and likely pathogenic (LP) variants and assessed deep learning (DL) models for predicting their molecular effects. Methods: 3′ and 5′ UTR variants documented as P or LP (P/LP) were obtained from ClinVar and refined by reviewing the annotated variant effect and reassessing evidence of pathogenicity following published guidelines. Prediction scores from sequence-based DL models were compared between three groups: P/LP variants acting though the mechanism for which the model was designed (model-matched), those operating through other mechanisms (model-mismatched), and putative benign variants. PhyloP was used to compare conservation scores between P/LP and putative benign variants. Results: 295 3′ and 188 5′ UTR variants were obtained from ClinVar, of which 26 3′ and 68 5′ UTR variants were classified as P/LP. Predictions by DL models achieved statistically significant differences when comparing modelmatched P/LP variants to both putative benign variants and modelmismatched P/LP variants, as well as when comparing all P/LP variants to putative benign variants. PhyloP conservation scores were significantly higher among P/LP compared to putative benign variants for both the 3′ and 5′ UTR. Discussion: In conclusion, we present a high-confidence set of P/LP 3′ and 5′ UTR variants spanning a range of mechanisms and supported by detailed pathogenicity and molecular mechanism evidence curation. Predictions from DL models further substantiate these classifications. These datasets will support further development and validation of DL algorithms designed to predict the functional impact of variants that may be implicated in rare disease. Frontiers Media S.A. 2023-09-08 /pmc/articles/PMC10517338/ /pubmed/37745687 http://dx.doi.org/10.3389/fmolb.2023.1257550 Text en Copyright © 2023 Bohn, Lau, Wagih, Masud and Merico. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Molecular Biosciences Bohn, Emma Lau, Tammy T. Y. Wagih, Omar Masud, Tehmina Merico, Daniele A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction |
title | A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction |
title_full | A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction |
title_fullStr | A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction |
title_full_unstemmed | A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction |
title_short | A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction |
title_sort | curated census of pathogenic and likely pathogenic utr variants and evaluation of deep learning models for variant effect prediction |
topic | Molecular Biosciences |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10517338/ https://www.ncbi.nlm.nih.gov/pubmed/37745687 http://dx.doi.org/10.3389/fmolb.2023.1257550 |
work_keys_str_mv | AT bohnemma acuratedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction AT lautammyty acuratedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction AT wagihomar acuratedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction AT masudtehmina acuratedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction AT mericodaniele acuratedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction AT bohnemma curatedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction AT lautammyty curatedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction AT wagihomar curatedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction AT masudtehmina curatedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction AT mericodaniele curatedcensusofpathogenicandlikelypathogenicutrvariantsandevaluationofdeeplearningmodelsforvarianteffectprediction |