Cargando…
Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements
PURPOSE: Prior studies have demonstrated the significance of specific cis-regulatory variants in retinal disease; however, determining the functional impact of regulatory variants remains a major challenge. In this study, we utilized a machine learning approach, trained on epigenomic data from the a...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Association for Research in Vision and Ophthalmology
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9034719/ https://www.ncbi.nlm.nih.gov/pubmed/35435921 http://dx.doi.org/10.1167/tvst.11.4.16 |
_version_ | 1784693171019579392 |
---|---|
author | VandenBosch, Leah S. Luu, Kelsey Timms, Andrew E. Challam, Shriya Wu, Yue Lee, Aaron Y. Cherry, Timothy J. |
author_facet | VandenBosch, Leah S. Luu, Kelsey Timms, Andrew E. Challam, Shriya Wu, Yue Lee, Aaron Y. Cherry, Timothy J. |
author_sort | VandenBosch, Leah S. |
collection | PubMed |
description | PURPOSE: Prior studies have demonstrated the significance of specific cis-regulatory variants in retinal disease; however, determining the functional impact of regulatory variants remains a major challenge. In this study, we utilized a machine learning approach, trained on epigenomic data from the adult human retina, to systematically quantify the predicted impact of cis-regulatory variants. METHODS: We used human retinal DNA accessibility data (ATAC-seq) to determine a set of 18.9k high-confidence, putative cis-regulatory elements. Eighty percent of these elements were used to train a machine learning model utilizing a gapped k-mer support vector machine–based approach. In silico saturation mutagenesis and variant scoring was applied to predict the functional impact of all potential single nucleotide variants within cis-regulatory elements. Impact scores were tested in a 20% hold-out dataset and compared to allele population frequency, phylogenetic conservation, transcription factor (TF) binding motifs, and existing massively parallel reporter assay data. RESULTS: We generated a model that distinguishes between human retinal regulatory elements and negative test sequences with 95% accuracy. Among a hold-out test set of 3.7k human retinal CREs, all possible single nucleotide variants were scored. Variants with negative impact scores correlated with higher phylogenetic conservation of the reference allele, disruption of predicted TF binding motifs, and massively parallel reporter expression. CONCLUSIONS: We demonstrated the utility of human retinal epigenomic data to train a machine learning model for the purpose of predicting the impact of non-coding regulatory sequence variants. Our model accurately scored sequences and predicted putative transcription factor binding motifs. This approach has the potential to expedite the characterization of pathogenic non-coding sequence variants in the context of unexplained retinal disease. TRANSLATIONAL RELEVANCE: This workflow and resulting dataset serve as a promising genomic tool to facilitate the clinical prioritization of functionally disruptive non-coding mutations in the retina. |
format | Online Article Text |
id | pubmed-9034719 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | The Association for Research in Vision and Ophthalmology |
record_format | MEDLINE/PubMed |
spelling | pubmed-90347192022-04-24 Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements VandenBosch, Leah S. Luu, Kelsey Timms, Andrew E. Challam, Shriya Wu, Yue Lee, Aaron Y. Cherry, Timothy J. Transl Vis Sci Technol Article PURPOSE: Prior studies have demonstrated the significance of specific cis-regulatory variants in retinal disease; however, determining the functional impact of regulatory variants remains a major challenge. In this study, we utilized a machine learning approach, trained on epigenomic data from the adult human retina, to systematically quantify the predicted impact of cis-regulatory variants. METHODS: We used human retinal DNA accessibility data (ATAC-seq) to determine a set of 18.9k high-confidence, putative cis-regulatory elements. Eighty percent of these elements were used to train a machine learning model utilizing a gapped k-mer support vector machine–based approach. In silico saturation mutagenesis and variant scoring was applied to predict the functional impact of all potential single nucleotide variants within cis-regulatory elements. Impact scores were tested in a 20% hold-out dataset and compared to allele population frequency, phylogenetic conservation, transcription factor (TF) binding motifs, and existing massively parallel reporter assay data. RESULTS: We generated a model that distinguishes between human retinal regulatory elements and negative test sequences with 95% accuracy. Among a hold-out test set of 3.7k human retinal CREs, all possible single nucleotide variants were scored. Variants with negative impact scores correlated with higher phylogenetic conservation of the reference allele, disruption of predicted TF binding motifs, and massively parallel reporter expression. CONCLUSIONS: We demonstrated the utility of human retinal epigenomic data to train a machine learning model for the purpose of predicting the impact of non-coding regulatory sequence variants. Our model accurately scored sequences and predicted putative transcription factor binding motifs. This approach has the potential to expedite the characterization of pathogenic non-coding sequence variants in the context of unexplained retinal disease. TRANSLATIONAL RELEVANCE: This workflow and resulting dataset serve as a promising genomic tool to facilitate the clinical prioritization of functionally disruptive non-coding mutations in the retina. The Association for Research in Vision and Ophthalmology 2022-04-18 /pmc/articles/PMC9034719/ /pubmed/35435921 http://dx.doi.org/10.1167/tvst.11.4.16 Text en Copyright 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. |
spellingShingle | Article VandenBosch, Leah S. Luu, Kelsey Timms, Andrew E. Challam, Shriya Wu, Yue Lee, Aaron Y. Cherry, Timothy J. Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements |
title | Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements |
title_full | Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements |
title_fullStr | Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements |
title_full_unstemmed | Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements |
title_short | Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements |
title_sort | machine learning prediction of non-coding variant impact in human retinal cis-regulatory elements |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9034719/ https://www.ncbi.nlm.nih.gov/pubmed/35435921 http://dx.doi.org/10.1167/tvst.11.4.16 |
work_keys_str_mv | AT vandenboschleahs machinelearningpredictionofnoncodingvariantimpactinhumanretinalcisregulatoryelements AT luukelsey machinelearningpredictionofnoncodingvariantimpactinhumanretinalcisregulatoryelements AT timmsandrewe machinelearningpredictionofnoncodingvariantimpactinhumanretinalcisregulatoryelements AT challamshriya machinelearningpredictionofnoncodingvariantimpactinhumanretinalcisregulatoryelements AT wuyue machinelearningpredictionofnoncodingvariantimpactinhumanretinalcisregulatoryelements AT leeaarony machinelearningpredictionofnoncodingvariantimpactinhumanretinalcisregulatoryelements AT cherrytimothyj machinelearningpredictionofnoncodingvariantimpactinhumanretinalcisregulatoryelements |