Cargando…

Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties

Genomic regions with gene regulatory enhancer activity turnover rapidly across mammals. In contrast, gene expression patterns and transcription factor binding preferences are largely conserved between mammalian species. Based on this conservation, we hypothesized that enhancers active in different m...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Ling, Fish, Alexandra E., Capra, John A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191148/
https://www.ncbi.nlm.nih.gov/pubmed/30286077
http://dx.doi.org/10.1371/journal.pcbi.1006484
_version_ 1783363672920293376
author Chen, Ling
Fish, Alexandra E.
Capra, John A.
author_facet Chen, Ling
Fish, Alexandra E.
Capra, John A.
author_sort Chen, Ling
collection PubMed
description Genomic regions with gene regulatory enhancer activity turnover rapidly across mammals. In contrast, gene expression patterns and transcription factor binding preferences are largely conserved between mammalian species. Based on this conservation, we hypothesized that enhancers active in different mammals would exhibit conserved sequence patterns in spite of their different genomic locations. To investigate this hypothesis, we evaluated the extent to which sequence patterns that are predictive of enhancers in one species are predictive of enhancers in other mammalian species by training and testing two types of machine learning models. We trained support vector machine (SVM) and convolutional neural network (CNN) classifiers to distinguish enhancers defined by histone marks from the genomic background based on DNA sequence patterns in human, macaque, mouse, dog, cow, and opossum. The classifiers accurately identified many adult liver, developing limb, and developing brain enhancers, and the CNNs outperformed the SVMs. Furthermore, classifiers trained in one species and tested in another performed nearly as well as classifiers trained and tested on the same species. We observed similar cross-species conservation when applying the models to human and mouse enhancers validated in transgenic assays. This indicates that many short sequence patterns predictive of enhancers are largely conserved. The sequence patterns most predictive of enhancers in each species matched the binding motifs for a common set of TFs enriched for expression in relevant tissues, supporting the biological relevance of the learned features. Thus, despite the rapid change of active enhancer locations between mammals, cross-species enhancer prediction is often possible. Our results suggest that short sequence patterns encoding enhancer activity have been maintained across more than 180 million years of mammalian evolution.
format Online
Article
Text
id pubmed-6191148
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-61911482018-10-25 Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties Chen, Ling Fish, Alexandra E. Capra, John A. PLoS Comput Biol Research Article Genomic regions with gene regulatory enhancer activity turnover rapidly across mammals. In contrast, gene expression patterns and transcription factor binding preferences are largely conserved between mammalian species. Based on this conservation, we hypothesized that enhancers active in different mammals would exhibit conserved sequence patterns in spite of their different genomic locations. To investigate this hypothesis, we evaluated the extent to which sequence patterns that are predictive of enhancers in one species are predictive of enhancers in other mammalian species by training and testing two types of machine learning models. We trained support vector machine (SVM) and convolutional neural network (CNN) classifiers to distinguish enhancers defined by histone marks from the genomic background based on DNA sequence patterns in human, macaque, mouse, dog, cow, and opossum. The classifiers accurately identified many adult liver, developing limb, and developing brain enhancers, and the CNNs outperformed the SVMs. Furthermore, classifiers trained in one species and tested in another performed nearly as well as classifiers trained and tested on the same species. We observed similar cross-species conservation when applying the models to human and mouse enhancers validated in transgenic assays. This indicates that many short sequence patterns predictive of enhancers are largely conserved. The sequence patterns most predictive of enhancers in each species matched the binding motifs for a common set of TFs enriched for expression in relevant tissues, supporting the biological relevance of the learned features. Thus, despite the rapid change of active enhancer locations between mammals, cross-species enhancer prediction is often possible. Our results suggest that short sequence patterns encoding enhancer activity have been maintained across more than 180 million years of mammalian evolution. Public Library of Science 2018-10-04 /pmc/articles/PMC6191148/ /pubmed/30286077 http://dx.doi.org/10.1371/journal.pcbi.1006484 Text en © 2018 Chen et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Chen, Ling
Fish, Alexandra E.
Capra, John A.
Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties
title Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties
title_full Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties
title_fullStr Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties
title_full_unstemmed Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties
title_short Prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties
title_sort prediction of gene regulatory enhancers across species reveals evolutionarily conserved sequence properties
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191148/
https://www.ncbi.nlm.nih.gov/pubmed/30286077
http://dx.doi.org/10.1371/journal.pcbi.1006484
work_keys_str_mv AT chenling predictionofgeneregulatoryenhancersacrossspeciesrevealsevolutionarilyconservedsequenceproperties
AT fishalexandrae predictionofgeneregulatoryenhancersacrossspeciesrevealsevolutionarilyconservedsequenceproperties
AT caprajohna predictionofgeneregulatoryenhancersacrossspeciesrevealsevolutionarilyconservedsequenceproperties