Cargando…

Inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin

BACKGROUND: Evolutionary conservation is an invaluable tool for inferring functional significance in the genome, including regions that are crucial across many species and those that have undergone convergent evolution. Computational methods to test for sequence conservation are dominated by algorit...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaplow, Irene M., Schäffer, Daniel E., Wirthlin, Morgan E., Lawler, Alyssa J., Brown, Ashley R., Kleyman, Michael, Pfenning, Andreas R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8996547/
https://www.ncbi.nlm.nih.gov/pubmed/35410163
http://dx.doi.org/10.1186/s12864-022-08450-7
_version_ 1784684511053742080
author Kaplow, Irene M.
Schäffer, Daniel E.
Wirthlin, Morgan E.
Lawler, Alyssa J.
Brown, Ashley R.
Kleyman, Michael
Pfenning, Andreas R.
author_facet Kaplow, Irene M.
Schäffer, Daniel E.
Wirthlin, Morgan E.
Lawler, Alyssa J.
Brown, Ashley R.
Kleyman, Michael
Pfenning, Andreas R.
author_sort Kaplow, Irene M.
collection PubMed
description BACKGROUND: Evolutionary conservation is an invaluable tool for inferring functional significance in the genome, including regions that are crucial across many species and those that have undergone convergent evolution. Computational methods to test for sequence conservation are dominated by algorithms that examine the ability of one or more nucleotides to align across large evolutionary distances. While these nucleotide alignment-based approaches have proven powerful for protein-coding genes and some non-coding elements, they fail to capture conservation of many enhancers, distal regulatory elements that control spatial and temporal patterns of gene expression. The function of enhancers is governed by a complex, often tissue- and cell type-specific code that links combinations of transcription factor binding sites and other regulation-related sequence patterns to regulatory activity. Thus, function of orthologous enhancer regions can be conserved across large evolutionary distances, even when nucleotide turnover is high. RESULTS: We present a new machine learning-based approach for evaluating enhancer conservation that leverages the combinatorial sequence code of enhancer activity rather than relying on the alignment of individual nucleotides. We first train a convolutional neural network model that can predict tissue-specific open chromatin, a proxy for enhancer activity, across mammals. Next, we apply that model to distinguish instances where the genome sequence would predict conserved function versus a loss of regulatory activity in that tissue. We present criteria for systematically evaluating model performance for this task and use them to demonstrate that our models accurately predict tissue-specific conservation and divergence in open chromatin between primate and rodent species, vastly out-performing leading nucleotide alignment-based approaches. We then apply our models to predict open chromatin at orthologs of brain and liver open chromatin regions across hundreds of mammals and find that brain enhancers associated with neuron activity have a stronger tendency than the general population to have predicted lineage-specific open chromatin. CONCLUSION: The framework presented here provides a mechanism to annotate tissue-specific regulatory function across hundreds of genomes and to study enhancer evolution using predicted regulatory differences rather than nucleotide-level conservation measurements. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-022-08450-7.
format Online
Article
Text
id pubmed-8996547
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-89965472022-04-12 Inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin Kaplow, Irene M. Schäffer, Daniel E. Wirthlin, Morgan E. Lawler, Alyssa J. Brown, Ashley R. Kleyman, Michael Pfenning, Andreas R. BMC Genomics Research BACKGROUND: Evolutionary conservation is an invaluable tool for inferring functional significance in the genome, including regions that are crucial across many species and those that have undergone convergent evolution. Computational methods to test for sequence conservation are dominated by algorithms that examine the ability of one or more nucleotides to align across large evolutionary distances. While these nucleotide alignment-based approaches have proven powerful for protein-coding genes and some non-coding elements, they fail to capture conservation of many enhancers, distal regulatory elements that control spatial and temporal patterns of gene expression. The function of enhancers is governed by a complex, often tissue- and cell type-specific code that links combinations of transcription factor binding sites and other regulation-related sequence patterns to regulatory activity. Thus, function of orthologous enhancer regions can be conserved across large evolutionary distances, even when nucleotide turnover is high. RESULTS: We present a new machine learning-based approach for evaluating enhancer conservation that leverages the combinatorial sequence code of enhancer activity rather than relying on the alignment of individual nucleotides. We first train a convolutional neural network model that can predict tissue-specific open chromatin, a proxy for enhancer activity, across mammals. Next, we apply that model to distinguish instances where the genome sequence would predict conserved function versus a loss of regulatory activity in that tissue. We present criteria for systematically evaluating model performance for this task and use them to demonstrate that our models accurately predict tissue-specific conservation and divergence in open chromatin between primate and rodent species, vastly out-performing leading nucleotide alignment-based approaches. We then apply our models to predict open chromatin at orthologs of brain and liver open chromatin regions across hundreds of mammals and find that brain enhancers associated with neuron activity have a stronger tendency than the general population to have predicted lineage-specific open chromatin. CONCLUSION: The framework presented here provides a mechanism to annotate tissue-specific regulatory function across hundreds of genomes and to study enhancer evolution using predicted regulatory differences rather than nucleotide-level conservation measurements. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-022-08450-7. BioMed Central 2022-04-11 /pmc/articles/PMC8996547/ /pubmed/35410163 http://dx.doi.org/10.1186/s12864-022-08450-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Kaplow, Irene M.
Schäffer, Daniel E.
Wirthlin, Morgan E.
Lawler, Alyssa J.
Brown, Ashley R.
Kleyman, Michael
Pfenning, Andreas R.
Inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin
title Inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin
title_full Inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin
title_fullStr Inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin
title_full_unstemmed Inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin
title_short Inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin
title_sort inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8996547/
https://www.ncbi.nlm.nih.gov/pubmed/35410163
http://dx.doi.org/10.1186/s12864-022-08450-7
work_keys_str_mv AT kaplowirenem inferringmammaliantissuespecificregulatoryconservationbypredictingtissuespecificdifferencesinopenchromatin
AT schafferdaniele inferringmammaliantissuespecificregulatoryconservationbypredictingtissuespecificdifferencesinopenchromatin
AT wirthlinmorgane inferringmammaliantissuespecificregulatoryconservationbypredictingtissuespecificdifferencesinopenchromatin
AT lawleralyssaj inferringmammaliantissuespecificregulatoryconservationbypredictingtissuespecificdifferencesinopenchromatin
AT brownashleyr inferringmammaliantissuespecificregulatoryconservationbypredictingtissuespecificdifferencesinopenchromatin
AT kleymanmichael inferringmammaliantissuespecificregulatoryconservationbypredictingtissuespecificdifferencesinopenchromatin
AT pfenningandreasr inferringmammaliantissuespecificregulatoryconservationbypredictingtissuespecificdifferencesinopenchromatin