Cargando…
Accurate and highly interpretable prediction of gene expression from histone modifications
BACKGROUND: Histone Mark Modifications (HMs) are crucial actors in gene regulation, as they actively remodel chromatin to modulate transcriptional activity: aberrant combinatorial patterns of HMs have been connected with several diseases, including cancer. HMs are, however, reversible modifications:...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9040271/ https://www.ncbi.nlm.nih.gov/pubmed/35473556 http://dx.doi.org/10.1186/s12859-022-04687-x |
_version_ | 1784694301487267840 |
---|---|
author | Frasca, Fabrizio Matteucci, Matteo Leone, Michele Morelli, Marco J. Masseroli, Marco |
author_facet | Frasca, Fabrizio Matteucci, Matteo Leone, Michele Morelli, Marco J. Masseroli, Marco |
author_sort | Frasca, Fabrizio |
collection | PubMed |
description | BACKGROUND: Histone Mark Modifications (HMs) are crucial actors in gene regulation, as they actively remodel chromatin to modulate transcriptional activity: aberrant combinatorial patterns of HMs have been connected with several diseases, including cancer. HMs are, however, reversible modifications: understanding their role in disease would allow the design of ‘epigenetic drugs’ for specific, non-invasive treatments. Standard statistical techniques were not entirely successful in extracting representative features from raw HM signals over gene locations. On the other hand, deep learning approaches allow for effective automatic feature extraction, but at the expense of model interpretation. RESULTS: Here, we propose ShallowChrome, a novel computational pipeline to model transcriptional regulation via HMs in both an accurate and interpretable way. We attain state-of-the-art results on the binary classification of gene transcriptional states over 56 cell-types from the REMC database, largely outperforming recent deep learning approaches. We interpret our models by extracting insightful gene-specific regulative patterns, and we analyse them for the specific case of the PAX5 gene over three differentiated blood cell lines. Finally, we compare the patterns we obtained with the characteristic emission patterns of ChromHMM, and show that ShallowChrome is able to coherently rank groups of chromatin states w.r.t. their transcriptional activity. CONCLUSIONS: In this work we demonstrate that it is possible to model HM-modulated gene expression regulation in a highly accurate, yet interpretable way. Our feature extraction algorithm leverages on data downstream the identification of enriched regions to retrieve gene-wise, statistically significant and dynamically located features for each HM. These features are highly predictive of gene transcriptional state, and allow for accurate modeling by computationally efficient logistic regression models. These models allow a direct inspection and a rigorous interpretation, helping to formulate quantifiable hypotheses. |
format | Online Article Text |
id | pubmed-9040271 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-90402712022-04-27 Accurate and highly interpretable prediction of gene expression from histone modifications Frasca, Fabrizio Matteucci, Matteo Leone, Michele Morelli, Marco J. Masseroli, Marco BMC Bioinformatics Research BACKGROUND: Histone Mark Modifications (HMs) are crucial actors in gene regulation, as they actively remodel chromatin to modulate transcriptional activity: aberrant combinatorial patterns of HMs have been connected with several diseases, including cancer. HMs are, however, reversible modifications: understanding their role in disease would allow the design of ‘epigenetic drugs’ for specific, non-invasive treatments. Standard statistical techniques were not entirely successful in extracting representative features from raw HM signals over gene locations. On the other hand, deep learning approaches allow for effective automatic feature extraction, but at the expense of model interpretation. RESULTS: Here, we propose ShallowChrome, a novel computational pipeline to model transcriptional regulation via HMs in both an accurate and interpretable way. We attain state-of-the-art results on the binary classification of gene transcriptional states over 56 cell-types from the REMC database, largely outperforming recent deep learning approaches. We interpret our models by extracting insightful gene-specific regulative patterns, and we analyse them for the specific case of the PAX5 gene over three differentiated blood cell lines. Finally, we compare the patterns we obtained with the characteristic emission patterns of ChromHMM, and show that ShallowChrome is able to coherently rank groups of chromatin states w.r.t. their transcriptional activity. CONCLUSIONS: In this work we demonstrate that it is possible to model HM-modulated gene expression regulation in a highly accurate, yet interpretable way. Our feature extraction algorithm leverages on data downstream the identification of enriched regions to retrieve gene-wise, statistically significant and dynamically located features for each HM. These features are highly predictive of gene transcriptional state, and allow for accurate modeling by computationally efficient logistic regression models. These models allow a direct inspection and a rigorous interpretation, helping to formulate quantifiable hypotheses. BioMed Central 2022-04-26 /pmc/articles/PMC9040271/ /pubmed/35473556 http://dx.doi.org/10.1186/s12859-022-04687-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Frasca, Fabrizio Matteucci, Matteo Leone, Michele Morelli, Marco J. Masseroli, Marco Accurate and highly interpretable prediction of gene expression from histone modifications |
title | Accurate and highly interpretable prediction of gene expression from histone modifications |
title_full | Accurate and highly interpretable prediction of gene expression from histone modifications |
title_fullStr | Accurate and highly interpretable prediction of gene expression from histone modifications |
title_full_unstemmed | Accurate and highly interpretable prediction of gene expression from histone modifications |
title_short | Accurate and highly interpretable prediction of gene expression from histone modifications |
title_sort | accurate and highly interpretable prediction of gene expression from histone modifications |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9040271/ https://www.ncbi.nlm.nih.gov/pubmed/35473556 http://dx.doi.org/10.1186/s12859-022-04687-x |
work_keys_str_mv | AT frascafabrizio accurateandhighlyinterpretablepredictionofgeneexpressionfromhistonemodifications AT matteuccimatteo accurateandhighlyinterpretablepredictionofgeneexpressionfromhistonemodifications AT leonemichele accurateandhighlyinterpretablepredictionofgeneexpressionfromhistonemodifications AT morellimarcoj accurateandhighlyinterpretablepredictionofgeneexpressionfromhistonemodifications AT masserolimarco accurateandhighlyinterpretablepredictionofgeneexpressionfromhistonemodifications |