Cargando…
Deep learning-assisted genome-wide characterization of massively parallel reporter assays
Massively parallel reporter assay (MPRA) is a high-throughput method that enables the study of the regulatory activities of tens of thousands of DNA oligonucleotides in a single experiment. While MPRA experiments have grown in popularity, their small sample sizes compared to the scale of the human g...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9723615/ https://www.ncbi.nlm.nih.gov/pubmed/36350674 http://dx.doi.org/10.1093/nar/gkac990 |
_version_ | 1784844222936907776 |
---|---|
author | Lu, Fred Sossin, Aaron Abell, Nathan Montgomery, Stephen B He, Zihuai |
author_facet | Lu, Fred Sossin, Aaron Abell, Nathan Montgomery, Stephen B He, Zihuai |
author_sort | Lu, Fred |
collection | PubMed |
description | Massively parallel reporter assay (MPRA) is a high-throughput method that enables the study of the regulatory activities of tens of thousands of DNA oligonucleotides in a single experiment. While MPRA experiments have grown in popularity, their small sample sizes compared to the scale of the human genome limits our understanding of the regulatory effects they detect. To address this, we develop a deep learning model, MpraNet, to distinguish potential MPRA targets from the background genome. This model achieves high discriminative performance (AUROC = 0.85) at differentiating MPRA positives from a set of control variants that mimic the background genome when applied to the lymphoblastoid cell line. We observe that existing functional scores represent very distinct functional effects, and most of them fail to characterize the regulatory effect that MPRA detects. Using MpraNet, we predict potential MPRA functional variants across the genome and identify the distributions of MPRA effect relative to other characteristics of genetic variation, including allele frequency, alternative functional annotations specified by FAVOR, and phenome-wide associations. We also observed that the predicted MPRA positives are not uniformly distributed across the genome; instead, they are clumped together in active regions comprising 9.95% of the genome and inactive regions comprising 89.07% of the genome. Furthermore, we propose our model as a screen to filter MPRA experiment candidates at genome-wide scale, enabling future experiments to be more cost-efficient by increasing precision relative to that observed from previous MPRAs. |
format | Online Article Text |
id | pubmed-9723615 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-97236152022-12-07 Deep learning-assisted genome-wide characterization of massively parallel reporter assays Lu, Fred Sossin, Aaron Abell, Nathan Montgomery, Stephen B He, Zihuai Nucleic Acids Res Computational Biology Massively parallel reporter assay (MPRA) is a high-throughput method that enables the study of the regulatory activities of tens of thousands of DNA oligonucleotides in a single experiment. While MPRA experiments have grown in popularity, their small sample sizes compared to the scale of the human genome limits our understanding of the regulatory effects they detect. To address this, we develop a deep learning model, MpraNet, to distinguish potential MPRA targets from the background genome. This model achieves high discriminative performance (AUROC = 0.85) at differentiating MPRA positives from a set of control variants that mimic the background genome when applied to the lymphoblastoid cell line. We observe that existing functional scores represent very distinct functional effects, and most of them fail to characterize the regulatory effect that MPRA detects. Using MpraNet, we predict potential MPRA functional variants across the genome and identify the distributions of MPRA effect relative to other characteristics of genetic variation, including allele frequency, alternative functional annotations specified by FAVOR, and phenome-wide associations. We also observed that the predicted MPRA positives are not uniformly distributed across the genome; instead, they are clumped together in active regions comprising 9.95% of the genome and inactive regions comprising 89.07% of the genome. Furthermore, we propose our model as a screen to filter MPRA experiment candidates at genome-wide scale, enabling future experiments to be more cost-efficient by increasing precision relative to that observed from previous MPRAs. Oxford University Press 2022-11-09 /pmc/articles/PMC9723615/ /pubmed/36350674 http://dx.doi.org/10.1093/nar/gkac990 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Computational Biology Lu, Fred Sossin, Aaron Abell, Nathan Montgomery, Stephen B He, Zihuai Deep learning-assisted genome-wide characterization of massively parallel reporter assays |
title | Deep learning-assisted genome-wide characterization of massively parallel reporter assays |
title_full | Deep learning-assisted genome-wide characterization of massively parallel reporter assays |
title_fullStr | Deep learning-assisted genome-wide characterization of massively parallel reporter assays |
title_full_unstemmed | Deep learning-assisted genome-wide characterization of massively parallel reporter assays |
title_short | Deep learning-assisted genome-wide characterization of massively parallel reporter assays |
title_sort | deep learning-assisted genome-wide characterization of massively parallel reporter assays |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9723615/ https://www.ncbi.nlm.nih.gov/pubmed/36350674 http://dx.doi.org/10.1093/nar/gkac990 |
work_keys_str_mv | AT lufred deeplearningassistedgenomewidecharacterizationofmassivelyparallelreporterassays AT sossinaaron deeplearningassistedgenomewidecharacterizationofmassivelyparallelreporterassays AT abellnathan deeplearningassistedgenomewidecharacterizationofmassivelyparallelreporterassays AT montgomerystephenb deeplearningassistedgenomewidecharacterizationofmassivelyparallelreporterassays AT hezihuai deeplearningassistedgenomewidecharacterizationofmassivelyparallelreporterassays |