Cargando…

Accurate and highly interpretable prediction of gene expression from histone modifications

BACKGROUND: Histone Mark Modifications (HMs) are crucial actors in gene regulation, as they actively remodel chromatin to modulate transcriptional activity: aberrant combinatorial patterns of HMs have been connected with several diseases, including cancer. HMs are, however, reversible modifications:...

Descripción completa

Detalles Bibliográficos
Autores principales: Frasca, Fabrizio, Matteucci, Matteo, Leone, Michele, Morelli, Marco J., Masseroli, Marco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9040271/
https://www.ncbi.nlm.nih.gov/pubmed/35473556
http://dx.doi.org/10.1186/s12859-022-04687-x
_version_ 1784694301487267840
author Frasca, Fabrizio
Matteucci, Matteo
Leone, Michele
Morelli, Marco J.
Masseroli, Marco
author_facet Frasca, Fabrizio
Matteucci, Matteo
Leone, Michele
Morelli, Marco J.
Masseroli, Marco
author_sort Frasca, Fabrizio
collection PubMed
description BACKGROUND: Histone Mark Modifications (HMs) are crucial actors in gene regulation, as they actively remodel chromatin to modulate transcriptional activity: aberrant combinatorial patterns of HMs have been connected with several diseases, including cancer. HMs are, however, reversible modifications: understanding their role in disease would allow the design of ‘epigenetic drugs’ for specific, non-invasive treatments. Standard statistical techniques were not entirely successful in extracting representative features from raw HM signals over gene locations. On the other hand, deep learning approaches allow for effective automatic feature extraction, but at the expense of model interpretation. RESULTS: Here, we propose ShallowChrome, a novel computational pipeline to model transcriptional regulation via HMs in both an accurate and interpretable way. We attain state-of-the-art results on the binary classification of gene transcriptional states over 56 cell-types from the REMC database, largely outperforming recent deep learning approaches. We interpret our models by extracting insightful gene-specific regulative patterns, and we analyse them for the specific case of the PAX5 gene over three differentiated blood cell lines. Finally, we compare the patterns we obtained with the characteristic emission patterns of ChromHMM, and show that ShallowChrome is able to coherently rank groups of chromatin states w.r.t. their transcriptional activity. CONCLUSIONS: In this work we demonstrate that it is possible to model HM-modulated gene expression regulation in a highly accurate, yet interpretable way. Our feature extraction algorithm leverages on data downstream the identification of enriched regions to retrieve gene-wise, statistically significant and dynamically located features for each HM. These features are highly predictive of gene transcriptional state, and allow for accurate modeling by computationally efficient logistic regression models. These models allow a direct inspection and a rigorous interpretation, helping to formulate quantifiable hypotheses.
format Online
Article
Text
id pubmed-9040271
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-90402712022-04-27 Accurate and highly interpretable prediction of gene expression from histone modifications Frasca, Fabrizio Matteucci, Matteo Leone, Michele Morelli, Marco J. Masseroli, Marco BMC Bioinformatics Research BACKGROUND: Histone Mark Modifications (HMs) are crucial actors in gene regulation, as they actively remodel chromatin to modulate transcriptional activity: aberrant combinatorial patterns of HMs have been connected with several diseases, including cancer. HMs are, however, reversible modifications: understanding their role in disease would allow the design of ‘epigenetic drugs’ for specific, non-invasive treatments. Standard statistical techniques were not entirely successful in extracting representative features from raw HM signals over gene locations. On the other hand, deep learning approaches allow for effective automatic feature extraction, but at the expense of model interpretation. RESULTS: Here, we propose ShallowChrome, a novel computational pipeline to model transcriptional regulation via HMs in both an accurate and interpretable way. We attain state-of-the-art results on the binary classification of gene transcriptional states over 56 cell-types from the REMC database, largely outperforming recent deep learning approaches. We interpret our models by extracting insightful gene-specific regulative patterns, and we analyse them for the specific case of the PAX5 gene over three differentiated blood cell lines. Finally, we compare the patterns we obtained with the characteristic emission patterns of ChromHMM, and show that ShallowChrome is able to coherently rank groups of chromatin states w.r.t. their transcriptional activity. CONCLUSIONS: In this work we demonstrate that it is possible to model HM-modulated gene expression regulation in a highly accurate, yet interpretable way. Our feature extraction algorithm leverages on data downstream the identification of enriched regions to retrieve gene-wise, statistically significant and dynamically located features for each HM. These features are highly predictive of gene transcriptional state, and allow for accurate modeling by computationally efficient logistic regression models. These models allow a direct inspection and a rigorous interpretation, helping to formulate quantifiable hypotheses. BioMed Central 2022-04-26 /pmc/articles/PMC9040271/ /pubmed/35473556 http://dx.doi.org/10.1186/s12859-022-04687-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Frasca, Fabrizio
Matteucci, Matteo
Leone, Michele
Morelli, Marco J.
Masseroli, Marco
Accurate and highly interpretable prediction of gene expression from histone modifications
title Accurate and highly interpretable prediction of gene expression from histone modifications
title_full Accurate and highly interpretable prediction of gene expression from histone modifications
title_fullStr Accurate and highly interpretable prediction of gene expression from histone modifications
title_full_unstemmed Accurate and highly interpretable prediction of gene expression from histone modifications
title_short Accurate and highly interpretable prediction of gene expression from histone modifications
title_sort accurate and highly interpretable prediction of gene expression from histone modifications
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9040271/
https://www.ncbi.nlm.nih.gov/pubmed/35473556
http://dx.doi.org/10.1186/s12859-022-04687-x
work_keys_str_mv AT frascafabrizio accurateandhighlyinterpretablepredictionofgeneexpressionfromhistonemodifications
AT matteuccimatteo accurateandhighlyinterpretablepredictionofgeneexpressionfromhistonemodifications
AT leonemichele accurateandhighlyinterpretablepredictionofgeneexpressionfromhistonemodifications
AT morellimarcoj accurateandhighlyinterpretablepredictionofgeneexpressionfromhistonemodifications
AT masserolimarco accurateandhighlyinterpretablepredictionofgeneexpressionfromhistonemodifications