Cargando…

Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy

Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-...

Descripción completa

Detalles Bibliográficos
Autores principales: Arbel, Hamutal, Basu, Sumanta, Fisher, William W., Hammonds, Ann S., Wan, Kenneth H., Park, Soo, Weiszmann, Richard, Booth, Benjamin W., Keranen, Soile V., Henriquez, Clara, Shams Solari, Omid, Bickel, Peter J., Biggin, Mark D., Celniker, Susan E., Brown, James B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6338827/
https://www.ncbi.nlm.nih.gov/pubmed/30598455
http://dx.doi.org/10.1073/pnas.1808833115
_version_ 1783388503241916416
author Arbel, Hamutal
Basu, Sumanta
Fisher, William W.
Hammonds, Ann S.
Wan, Kenneth H.
Park, Soo
Weiszmann, Richard
Booth, Benjamin W.
Keranen, Soile V.
Henriquez, Clara
Shams Solari, Omid
Bickel, Peter J.
Biggin, Mark D.
Celniker, Susan E.
Brown, James B.
author_facet Arbel, Hamutal
Basu, Sumanta
Fisher, William W.
Hammonds, Ann S.
Wan, Kenneth H.
Park, Soo
Weiszmann, Richard
Booth, Benjamin W.
Keranen, Soile V.
Henriquez, Clara
Shams Solari, Omid
Bickel, Peter J.
Biggin, Mark D.
Celniker, Susan E.
Brown, James B.
author_sort Arbel, Hamutal
collection PubMed
description Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-out data results from heterogeneity of functional signatures in enhancer elements. We show that at least two classes of enhancers are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, greater than 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well-predicted elements is composed predominantly of enhancers driving multistage segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naïve Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome. An analysis of 32 SDEs using whole-mount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements.
format Online
Article
Text
id pubmed-6338827
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-63388272019-01-25 Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy Arbel, Hamutal Basu, Sumanta Fisher, William W. Hammonds, Ann S. Wan, Kenneth H. Park, Soo Weiszmann, Richard Booth, Benjamin W. Keranen, Soile V. Henriquez, Clara Shams Solari, Omid Bickel, Peter J. Biggin, Mark D. Celniker, Susan E. Brown, James B. Proc Natl Acad Sci U S A PNAS Plus Identifying functional enhancer elements in metazoan systems is a major challenge. Large-scale validation of enhancers predicted by ENCODE reveal false-positive rates of at least 70%. We used the pregrastrula-patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held-out data results from heterogeneity of functional signatures in enhancer elements. We show that at least two classes of enhancers are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, greater than 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well-predicted elements is composed predominantly of enhancers driving multistage segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naïve Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome. An analysis of 32 SDEs using whole-mount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements. National Academy of Sciences 2019-01-15 2018-12-31 /pmc/articles/PMC6338827/ /pubmed/30598455 http://dx.doi.org/10.1073/pnas.1808833115 Text en Copyright © 2019 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/ This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle PNAS Plus
Arbel, Hamutal
Basu, Sumanta
Fisher, William W.
Hammonds, Ann S.
Wan, Kenneth H.
Park, Soo
Weiszmann, Richard
Booth, Benjamin W.
Keranen, Soile V.
Henriquez, Clara
Shams Solari, Omid
Bickel, Peter J.
Biggin, Mark D.
Celniker, Susan E.
Brown, James B.
Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy
title Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy
title_full Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy
title_fullStr Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy
title_full_unstemmed Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy
title_short Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy
title_sort exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy
topic PNAS Plus
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6338827/
https://www.ncbi.nlm.nih.gov/pubmed/30598455
http://dx.doi.org/10.1073/pnas.1808833115
work_keys_str_mv AT arbelhamutal exploitingregulatoryheterogeneitytosystematicallyidentifyenhancerswithhighaccuracy
AT basusumanta exploitingregulatoryheterogeneitytosystematicallyidentifyenhancerswithhighaccuracy
AT fisherwilliamw exploitingregulatoryheterogeneitytosystematicallyidentifyenhancerswithhighaccuracy
AT hammondsanns exploitingregulatoryheterogeneitytosystematicallyidentifyenhancerswithhighaccuracy
AT wankennethh exploitingregulatoryheterogeneitytosystematicallyidentifyenhancerswithhighaccuracy
AT parksoo exploitingregulatoryheterogeneitytosystematicallyidentifyenhancerswithhighaccuracy
AT weiszmannrichard exploitingregulatoryheterogeneitytosystematicallyidentifyenhancerswithhighaccuracy
AT boothbenjaminw exploitingregulatoryheterogeneitytosystematicallyidentifyenhancerswithhighaccuracy
AT keranensoilev exploitingregulatoryheterogeneitytosystematicallyidentifyenhancerswithhighaccuracy
AT henriquezclara exploitingregulatoryheterogeneitytosystematicallyidentifyenhancerswithhighaccuracy
AT shamssolariomid exploitingregulatoryheterogeneitytosystematicallyidentifyenhancerswithhighaccuracy
AT bickelpeterj exploitingregulatoryheterogeneitytosystematicallyidentifyenhancerswithhighaccuracy
AT bigginmarkd exploitingregulatoryheterogeneitytosystematicallyidentifyenhancerswithhighaccuracy
AT celnikersusane exploitingregulatoryheterogeneitytosystematicallyidentifyenhancerswithhighaccuracy
AT brownjamesb exploitingregulatoryheterogeneitytosystematicallyidentifyenhancerswithhighaccuracy