Cargando…

Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays

The relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-D...

Descripción completa

Detalles Bibliográficos
Autores principales: Movva, Rajiv, Greenside, Peyton, Marinov, Georgi K., Nair, Surag, Shrikumar, Avanti, Kundaje, Anshul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6576758/
https://www.ncbi.nlm.nih.gov/pubmed/31206543
http://dx.doi.org/10.1371/journal.pone.0218073
_version_ 1783427836065873920
author Movva, Rajiv
Greenside, Peyton
Marinov, Georgi K.
Nair, Surag
Shrikumar, Avanti
Kundaje, Anshul
author_facet Movva, Rajiv
Greenside, Peyton
Marinov, Georgi K.
Nair, Surag
Shrikumar, Avanti
Kundaje, Anshul
author_sort Movva, Rajiv
collection PubMed
description The relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-DragoNN, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained our model on the Sharpr-MPRA dataset that measures the activity of ∼500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. MPRA-DragoNN predictions were moderately correlated (Spearman ρ = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of our model to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced.
format Online
Article
Text
id pubmed-6576758
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-65767582019-06-28 Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays Movva, Rajiv Greenside, Peyton Marinov, Georgi K. Nair, Surag Shrikumar, Avanti Kundaje, Anshul PLoS One Research Article The relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-DragoNN, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained our model on the Sharpr-MPRA dataset that measures the activity of ∼500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. MPRA-DragoNN predictions were moderately correlated (Spearman ρ = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of our model to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced. Public Library of Science 2019-06-17 /pmc/articles/PMC6576758/ /pubmed/31206543 http://dx.doi.org/10.1371/journal.pone.0218073 Text en © 2019 Movva et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Movva, Rajiv
Greenside, Peyton
Marinov, Georgi K.
Nair, Surag
Shrikumar, Avanti
Kundaje, Anshul
Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays
title Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays
title_full Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays
title_fullStr Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays
title_full_unstemmed Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays
title_short Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays
title_sort deciphering regulatory dna sequences and noncoding genetic variants using neural network models of massively parallel reporter assays
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6576758/
https://www.ncbi.nlm.nih.gov/pubmed/31206543
http://dx.doi.org/10.1371/journal.pone.0218073
work_keys_str_mv AT movvarajiv decipheringregulatorydnasequencesandnoncodinggeneticvariantsusingneuralnetworkmodelsofmassivelyparallelreporterassays
AT greensidepeyton decipheringregulatorydnasequencesandnoncodinggeneticvariantsusingneuralnetworkmodelsofmassivelyparallelreporterassays
AT marinovgeorgik decipheringregulatorydnasequencesandnoncodinggeneticvariantsusingneuralnetworkmodelsofmassivelyparallelreporterassays
AT nairsurag decipheringregulatorydnasequencesandnoncodinggeneticvariantsusingneuralnetworkmodelsofmassivelyparallelreporterassays
AT shrikumaravanti decipheringregulatorydnasequencesandnoncodinggeneticvariantsusingneuralnetworkmodelsofmassivelyparallelreporterassays
AT kundajeanshul decipheringregulatorydnasequencesandnoncodinggeneticvariantsusingneuralnetworkmodelsofmassivelyparallelreporterassays