Cargando…

Integrating Diverse Datasets Improves Developmental Enhancer Prediction

Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing devel...

Descripción completa

Detalles Bibliográficos
Autores principales: Erwin, Genevieve D., Oksenberg, Nir, Truty, Rebecca M., Kostka, Dennis, Murphy, Karl K., Ahituv, Nadav, Pollard, Katherine S., Capra, John A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4072507/
https://www.ncbi.nlm.nih.gov/pubmed/24967590
http://dx.doi.org/10.1371/journal.pcbi.1003677
_version_ 1782322971358003200
author Erwin, Genevieve D.
Oksenberg, Nir
Truty, Rebecca M.
Kostka, Dennis
Murphy, Karl K.
Ahituv, Nadav
Pollard, Katherine S.
Capra, John A.
author_facet Erwin, Genevieve D.
Oksenberg, Nir
Truty, Rebecca M.
Kostka, Dennis
Murphy, Karl K.
Ahituv, Nadav
Pollard, Katherine S.
Capra, John A.
author_sort Erwin, Genevieve D.
collection PubMed
description Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable researchers to further investigate questions in developmental biology.
format Online
Article
Text
id pubmed-4072507
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-40725072014-07-02 Integrating Diverse Datasets Improves Developmental Enhancer Prediction Erwin, Genevieve D. Oksenberg, Nir Truty, Rebecca M. Kostka, Dennis Murphy, Karl K. Ahituv, Nadav Pollard, Katherine S. Capra, John A. PLoS Comput Biol Research Article Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable researchers to further investigate questions in developmental biology. Public Library of Science 2014-06-26 /pmc/articles/PMC4072507/ /pubmed/24967590 http://dx.doi.org/10.1371/journal.pcbi.1003677 Text en © 2014 Erwin et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Erwin, Genevieve D.
Oksenberg, Nir
Truty, Rebecca M.
Kostka, Dennis
Murphy, Karl K.
Ahituv, Nadav
Pollard, Katherine S.
Capra, John A.
Integrating Diverse Datasets Improves Developmental Enhancer Prediction
title Integrating Diverse Datasets Improves Developmental Enhancer Prediction
title_full Integrating Diverse Datasets Improves Developmental Enhancer Prediction
title_fullStr Integrating Diverse Datasets Improves Developmental Enhancer Prediction
title_full_unstemmed Integrating Diverse Datasets Improves Developmental Enhancer Prediction
title_short Integrating Diverse Datasets Improves Developmental Enhancer Prediction
title_sort integrating diverse datasets improves developmental enhancer prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4072507/
https://www.ncbi.nlm.nih.gov/pubmed/24967590
http://dx.doi.org/10.1371/journal.pcbi.1003677
work_keys_str_mv AT erwingenevieved integratingdiversedatasetsimprovesdevelopmentalenhancerprediction
AT oksenbergnir integratingdiversedatasetsimprovesdevelopmentalenhancerprediction
AT trutyrebeccam integratingdiversedatasetsimprovesdevelopmentalenhancerprediction
AT kostkadennis integratingdiversedatasetsimprovesdevelopmentalenhancerprediction
AT murphykarlk integratingdiversedatasetsimprovesdevelopmentalenhancerprediction
AT ahituvnadav integratingdiversedatasetsimprovesdevelopmentalenhancerprediction
AT pollardkatherines integratingdiversedatasetsimprovesdevelopmentalenhancerprediction
AT caprajohna integratingdiversedatasetsimprovesdevelopmentalenhancerprediction