Cargando…
Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
BACKGROUND: The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully a...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7370432/ https://www.ncbi.nlm.nih.gov/pubmed/32689977 http://dx.doi.org/10.1186/s12859-020-03621-3 |
_version_ | 1783560975804268544 |
---|---|
author | Osmala, Maria Lähdesmäki, Harri |
author_facet | Osmala, Maria Lähdesmäki, Harri |
author_sort | Osmala, Maria |
collection | PubMed |
description | BACKGROUND: The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently. RESULTS: In this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods. CONCLUSION: PREPRINT performs favorably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies. |
format | Online Article Text |
id | pubmed-7370432 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-73704322020-07-21 Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns Osmala, Maria Lähdesmäki, Harri BMC Bioinformatics Methodology Article BACKGROUND: The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently. RESULTS: In this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods. CONCLUSION: PREPRINT performs favorably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies. BioMed Central 2020-07-20 /pmc/articles/PMC7370432/ /pubmed/32689977 http://dx.doi.org/10.1186/s12859-020-03621-3 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Article Osmala, Maria Lähdesmäki, Harri Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns |
title | Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns |
title_full | Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns |
title_fullStr | Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns |
title_full_unstemmed | Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns |
title_short | Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns |
title_sort | enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7370432/ https://www.ncbi.nlm.nih.gov/pubmed/32689977 http://dx.doi.org/10.1186/s12859-020-03621-3 |
work_keys_str_mv | AT osmalamaria enhancerpredictioninthehumangenomebyprobabilisticmodellingofthechromatinfeaturepatterns AT lahdesmakiharri enhancerpredictioninthehumangenomebyprobabilisticmodellingofthechromatinfeaturepatterns |