Cargando…

Deep learning-assisted genome-wide characterization of massively parallel reporter assays

Massively parallel reporter assay (MPRA) is a high-throughput method that enables the study of the regulatory activities of tens of thousands of DNA oligonucleotides in a single experiment. While MPRA experiments have grown in popularity, their small sample sizes compared to the scale of the human g...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Fred, Sossin, Aaron, Abell, Nathan, Montgomery, Stephen B, He, Zihuai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9723615/
https://www.ncbi.nlm.nih.gov/pubmed/36350674
http://dx.doi.org/10.1093/nar/gkac990
_version_ 1784844222936907776
author Lu, Fred
Sossin, Aaron
Abell, Nathan
Montgomery, Stephen B
He, Zihuai
author_facet Lu, Fred
Sossin, Aaron
Abell, Nathan
Montgomery, Stephen B
He, Zihuai
author_sort Lu, Fred
collection PubMed
description Massively parallel reporter assay (MPRA) is a high-throughput method that enables the study of the regulatory activities of tens of thousands of DNA oligonucleotides in a single experiment. While MPRA experiments have grown in popularity, their small sample sizes compared to the scale of the human genome limits our understanding of the regulatory effects they detect. To address this, we develop a deep learning model, MpraNet, to distinguish potential MPRA targets from the background genome. This model achieves high discriminative performance (AUROC = 0.85) at differentiating MPRA positives from a set of control variants that mimic the background genome when applied to the lymphoblastoid cell line. We observe that existing functional scores represent very distinct functional effects, and most of them fail to characterize the regulatory effect that MPRA detects. Using MpraNet, we predict potential MPRA functional variants across the genome and identify the distributions of MPRA effect relative to other characteristics of genetic variation, including allele frequency, alternative functional annotations specified by FAVOR, and phenome-wide associations. We also observed that the predicted MPRA positives are not uniformly distributed across the genome; instead, they are clumped together in active regions comprising 9.95% of the genome and inactive regions comprising 89.07% of the genome. Furthermore, we propose our model as a screen to filter MPRA experiment candidates at genome-wide scale, enabling future experiments to be more cost-efficient by increasing precision relative to that observed from previous MPRAs.
format Online
Article
Text
id pubmed-9723615
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97236152022-12-07 Deep learning-assisted genome-wide characterization of massively parallel reporter assays Lu, Fred Sossin, Aaron Abell, Nathan Montgomery, Stephen B He, Zihuai Nucleic Acids Res Computational Biology Massively parallel reporter assay (MPRA) is a high-throughput method that enables the study of the regulatory activities of tens of thousands of DNA oligonucleotides in a single experiment. While MPRA experiments have grown in popularity, their small sample sizes compared to the scale of the human genome limits our understanding of the regulatory effects they detect. To address this, we develop a deep learning model, MpraNet, to distinguish potential MPRA targets from the background genome. This model achieves high discriminative performance (AUROC = 0.85) at differentiating MPRA positives from a set of control variants that mimic the background genome when applied to the lymphoblastoid cell line. We observe that existing functional scores represent very distinct functional effects, and most of them fail to characterize the regulatory effect that MPRA detects. Using MpraNet, we predict potential MPRA functional variants across the genome and identify the distributions of MPRA effect relative to other characteristics of genetic variation, including allele frequency, alternative functional annotations specified by FAVOR, and phenome-wide associations. We also observed that the predicted MPRA positives are not uniformly distributed across the genome; instead, they are clumped together in active regions comprising 9.95% of the genome and inactive regions comprising 89.07% of the genome. Furthermore, we propose our model as a screen to filter MPRA experiment candidates at genome-wide scale, enabling future experiments to be more cost-efficient by increasing precision relative to that observed from previous MPRAs. Oxford University Press 2022-11-09 /pmc/articles/PMC9723615/ /pubmed/36350674 http://dx.doi.org/10.1093/nar/gkac990 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Lu, Fred
Sossin, Aaron
Abell, Nathan
Montgomery, Stephen B
He, Zihuai
Deep learning-assisted genome-wide characterization of massively parallel reporter assays
title Deep learning-assisted genome-wide characterization of massively parallel reporter assays
title_full Deep learning-assisted genome-wide characterization of massively parallel reporter assays
title_fullStr Deep learning-assisted genome-wide characterization of massively parallel reporter assays
title_full_unstemmed Deep learning-assisted genome-wide characterization of massively parallel reporter assays
title_short Deep learning-assisted genome-wide characterization of massively parallel reporter assays
title_sort deep learning-assisted genome-wide characterization of massively parallel reporter assays
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9723615/
https://www.ncbi.nlm.nih.gov/pubmed/36350674
http://dx.doi.org/10.1093/nar/gkac990
work_keys_str_mv AT lufred deeplearningassistedgenomewidecharacterizationofmassivelyparallelreporterassays
AT sossinaaron deeplearningassistedgenomewidecharacterizationofmassivelyparallelreporterassays
AT abellnathan deeplearningassistedgenomewidecharacterizationofmassivelyparallelreporterassays
AT montgomerystephenb deeplearningassistedgenomewidecharacterizationofmassivelyparallelreporterassays
AT hezihuai deeplearningassistedgenomewidecharacterizationofmassivelyparallelreporterassays