Cargando…

Deep learning-assisted genome-wide characterization of massively parallel reporter assays

Massively parallel reporter assay (MPRA) is a high-throughput method that enables the study of the regulatory activities of tens of thousands of DNA oligonucleotides in a single experiment. While MPRA experiments have grown in popularity, their small sample sizes compared to the scale of the human g...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lu, Fred, Sossin, Aaron, Abell, Nathan, Montgomery, Stephen B, He, Zihuai
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Computational Biology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9723615/ https://www.ncbi.nlm.nih.gov/pubmed/36350674 http://dx.doi.org/10.1093/nar/gkac990

_version_	1784844222936907776
author	Lu, Fred Sossin, Aaron Abell, Nathan Montgomery, Stephen B He, Zihuai
author_facet	Lu, Fred Sossin, Aaron Abell, Nathan Montgomery, Stephen B He, Zihuai
author_sort	Lu, Fred
collection	PubMed
description	Massively parallel reporter assay (MPRA) is a high-throughput method that enables the study of the regulatory activities of tens of thousands of DNA oligonucleotides in a single experiment. While MPRA experiments have grown in popularity, their small sample sizes compared to the scale of the human genome limits our understanding of the regulatory effects they detect. To address this, we develop a deep learning model, MpraNet, to distinguish potential MPRA targets from the background genome. This model achieves high discriminative performance (AUROC = 0.85) at differentiating MPRA positives from a set of control variants that mimic the background genome when applied to the lymphoblastoid cell line. We observe that existing functional scores represent very distinct functional effects, and most of them fail to characterize the regulatory effect that MPRA detects. Using MpraNet, we predict potential MPRA functional variants across the genome and identify the distributions of MPRA effect relative to other characteristics of genetic variation, including allele frequency, alternative functional annotations specified by FAVOR, and phenome-wide associations. We also observed that the predicted MPRA positives are not uniformly distributed across the genome; instead, they are clumped together in active regions comprising 9.95% of the genome and inactive regions comprising 89.07% of the genome. Furthermore, we propose our model as a screen to filter MPRA experiment candidates at genome-wide scale, enabling future experiments to be more cost-efficient by increasing precision relative to that observed from previous MPRAs.
format	Online Article Text
id	pubmed-9723615
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-97236152022-12-07 Deep learning-assisted genome-wide characterization of massively parallel reporter assays Lu, Fred Sossin, Aaron Abell, Nathan Montgomery, Stephen B He, Zihuai Nucleic Acids Res Computational Biology Massively parallel reporter assay (MPRA) is a high-throughput method that enables the study of the regulatory activities of tens of thousands of DNA oligonucleotides in a single experiment. While MPRA experiments have grown in popularity, their small sample sizes compared to the scale of the human genome limits our understanding of the regulatory effects they detect. To address this, we develop a deep learning model, MpraNet, to distinguish potential MPRA targets from the background genome. This model achieves high discriminative performance (AUROC = 0.85) at differentiating MPRA positives from a set of control variants that mimic the background genome when applied to the lymphoblastoid cell line. We observe that existing functional scores represent very distinct functional effects, and most of them fail to characterize the regulatory effect that MPRA detects. Using MpraNet, we predict potential MPRA functional variants across the genome and identify the distributions of MPRA effect relative to other characteristics of genetic variation, including allele frequency, alternative functional annotations specified by FAVOR, and phenome-wide associations. We also observed that the predicted MPRA positives are not uniformly distributed across the genome; instead, they are clumped together in active regions comprising 9.95% of the genome and inactive regions comprising 89.07% of the genome. Furthermore, we propose our model as a screen to filter MPRA experiment candidates at genome-wide scale, enabling future experiments to be more cost-efficient by increasing precision relative to that observed from previous MPRAs. Oxford University Press 2022-11-09 /pmc/articles/PMC9723615/ /pubmed/36350674 http://dx.doi.org/10.1093/nar/gkac990 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Computational Biology Lu, Fred Sossin, Aaron Abell, Nathan Montgomery, Stephen B He, Zihuai Deep learning-assisted genome-wide characterization of massively parallel reporter assays
title	Deep learning-assisted genome-wide characterization of massively parallel reporter assays
title_full	Deep learning-assisted genome-wide characterization of massively parallel reporter assays
title_fullStr	Deep learning-assisted genome-wide characterization of massively parallel reporter assays
title_full_unstemmed	Deep learning-assisted genome-wide characterization of massively parallel reporter assays
title_short	Deep learning-assisted genome-wide characterization of massively parallel reporter assays
title_sort	deep learning-assisted genome-wide characterization of massively parallel reporter assays
topic	Computational Biology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9723615/ https://www.ncbi.nlm.nih.gov/pubmed/36350674 http://dx.doi.org/10.1093/nar/gkac990
work_keys_str_mv	AT lufred deeplearningassistedgenomewidecharacterizationofmassivelyparallelreporterassays AT sossinaaron deeplearningassistedgenomewidecharacterizationofmassivelyparallelreporterassays AT abellnathan deeplearningassistedgenomewidecharacterizationofmassivelyparallelreporterassays AT montgomerystephenb deeplearningassistedgenomewidecharacterizationofmassivelyparallelreporterassays AT hezihuai deeplearningassistedgenomewidecharacterizationofmassivelyparallelreporterassays

Deep learning-assisted genome-wide characterization of massively parallel reporter assays

Ejemplares similares