Cargando…

Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements

PURPOSE: Prior studies have demonstrated the significance of specific cis-regulatory variants in retinal disease; however, determining the functional impact of regulatory variants remains a major challenge. In this study, we utilized a machine learning approach, trained on epigenomic data from the a...

Descripción completa

Detalles Bibliográficos
Autores principales: VandenBosch, Leah S., Luu, Kelsey, Timms, Andrew E., Challam, Shriya, Wu, Yue, Lee, Aaron Y., Cherry, Timothy J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Association for Research in Vision and Ophthalmology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9034719/
https://www.ncbi.nlm.nih.gov/pubmed/35435921
http://dx.doi.org/10.1167/tvst.11.4.16
_version_ 1784693171019579392
author VandenBosch, Leah S.
Luu, Kelsey
Timms, Andrew E.
Challam, Shriya
Wu, Yue
Lee, Aaron Y.
Cherry, Timothy J.
author_facet VandenBosch, Leah S.
Luu, Kelsey
Timms, Andrew E.
Challam, Shriya
Wu, Yue
Lee, Aaron Y.
Cherry, Timothy J.
author_sort VandenBosch, Leah S.
collection PubMed
description PURPOSE: Prior studies have demonstrated the significance of specific cis-regulatory variants in retinal disease; however, determining the functional impact of regulatory variants remains a major challenge. In this study, we utilized a machine learning approach, trained on epigenomic data from the adult human retina, to systematically quantify the predicted impact of cis-regulatory variants. METHODS: We used human retinal DNA accessibility data (ATAC-seq) to determine a set of 18.9k high-confidence, putative cis-regulatory elements. Eighty percent of these elements were used to train a machine learning model utilizing a gapped k-mer support vector machine–based approach. In silico saturation mutagenesis and variant scoring was applied to predict the functional impact of all potential single nucleotide variants within cis-regulatory elements. Impact scores were tested in a 20% hold-out dataset and compared to allele population frequency, phylogenetic conservation, transcription factor (TF) binding motifs, and existing massively parallel reporter assay data. RESULTS: We generated a model that distinguishes between human retinal regulatory elements and negative test sequences with 95% accuracy. Among a hold-out test set of 3.7k human retinal CREs, all possible single nucleotide variants were scored. Variants with negative impact scores correlated with higher phylogenetic conservation of the reference allele, disruption of predicted TF binding motifs, and massively parallel reporter expression. CONCLUSIONS: We demonstrated the utility of human retinal epigenomic data to train a machine learning model for the purpose of predicting the impact of non-coding regulatory sequence variants. Our model accurately scored sequences and predicted putative transcription factor binding motifs. This approach has the potential to expedite the characterization of pathogenic non-coding sequence variants in the context of unexplained retinal disease. TRANSLATIONAL RELEVANCE: This workflow and resulting dataset serve as a promising genomic tool to facilitate the clinical prioritization of functionally disruptive non-coding mutations in the retina.
format Online
Article
Text
id pubmed-9034719
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher The Association for Research in Vision and Ophthalmology
record_format MEDLINE/PubMed
spelling pubmed-90347192022-04-24 Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements VandenBosch, Leah S. Luu, Kelsey Timms, Andrew E. Challam, Shriya Wu, Yue Lee, Aaron Y. Cherry, Timothy J. Transl Vis Sci Technol Article PURPOSE: Prior studies have demonstrated the significance of specific cis-regulatory variants in retinal disease; however, determining the functional impact of regulatory variants remains a major challenge. In this study, we utilized a machine learning approach, trained on epigenomic data from the adult human retina, to systematically quantify the predicted impact of cis-regulatory variants. METHODS: We used human retinal DNA accessibility data (ATAC-seq) to determine a set of 18.9k high-confidence, putative cis-regulatory elements. Eighty percent of these elements were used to train a machine learning model utilizing a gapped k-mer support vector machine–based approach. In silico saturation mutagenesis and variant scoring was applied to predict the functional impact of all potential single nucleotide variants within cis-regulatory elements. Impact scores were tested in a 20% hold-out dataset and compared to allele population frequency, phylogenetic conservation, transcription factor (TF) binding motifs, and existing massively parallel reporter assay data. RESULTS: We generated a model that distinguishes between human retinal regulatory elements and negative test sequences with 95% accuracy. Among a hold-out test set of 3.7k human retinal CREs, all possible single nucleotide variants were scored. Variants with negative impact scores correlated with higher phylogenetic conservation of the reference allele, disruption of predicted TF binding motifs, and massively parallel reporter expression. CONCLUSIONS: We demonstrated the utility of human retinal epigenomic data to train a machine learning model for the purpose of predicting the impact of non-coding regulatory sequence variants. Our model accurately scored sequences and predicted putative transcription factor binding motifs. This approach has the potential to expedite the characterization of pathogenic non-coding sequence variants in the context of unexplained retinal disease. TRANSLATIONAL RELEVANCE: This workflow and resulting dataset serve as a promising genomic tool to facilitate the clinical prioritization of functionally disruptive non-coding mutations in the retina. The Association for Research in Vision and Ophthalmology 2022-04-18 /pmc/articles/PMC9034719/ /pubmed/35435921 http://dx.doi.org/10.1167/tvst.11.4.16 Text en Copyright 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
spellingShingle Article
VandenBosch, Leah S.
Luu, Kelsey
Timms, Andrew E.
Challam, Shriya
Wu, Yue
Lee, Aaron Y.
Cherry, Timothy J.
Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements
title Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements
title_full Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements
title_fullStr Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements
title_full_unstemmed Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements
title_short Machine Learning Prediction of Non-Coding Variant Impact in Human Retinal cis-Regulatory Elements
title_sort machine learning prediction of non-coding variant impact in human retinal cis-regulatory elements
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9034719/
https://www.ncbi.nlm.nih.gov/pubmed/35435921
http://dx.doi.org/10.1167/tvst.11.4.16
work_keys_str_mv AT vandenboschleahs machinelearningpredictionofnoncodingvariantimpactinhumanretinalcisregulatoryelements
AT luukelsey machinelearningpredictionofnoncodingvariantimpactinhumanretinalcisregulatoryelements
AT timmsandrewe machinelearningpredictionofnoncodingvariantimpactinhumanretinalcisregulatoryelements
AT challamshriya machinelearningpredictionofnoncodingvariantimpactinhumanretinalcisregulatoryelements
AT wuyue machinelearningpredictionofnoncodingvariantimpactinhumanretinalcisregulatoryelements
AT leeaarony machinelearningpredictionofnoncodingvariantimpactinhumanretinalcisregulatoryelements
AT cherrytimothyj machinelearningpredictionofnoncodingvariantimpactinhumanretinalcisregulatoryelements