Cargando…

Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns

BACKGROUND: The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully a...

Descripción completa

Detalles Bibliográficos
Autores principales: Osmala, Maria, Lähdesmäki, Harri
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7370432/
https://www.ncbi.nlm.nih.gov/pubmed/32689977
http://dx.doi.org/10.1186/s12859-020-03621-3
_version_ 1783560975804268544
author Osmala, Maria
Lähdesmäki, Harri
author_facet Osmala, Maria
Lähdesmäki, Harri
author_sort Osmala, Maria
collection PubMed
description BACKGROUND: The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently. RESULTS: In this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods. CONCLUSION: PREPRINT performs favorably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies.
format Online
Article
Text
id pubmed-7370432
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-73704322020-07-21 Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns Osmala, Maria Lähdesmäki, Harri BMC Bioinformatics Methodology Article BACKGROUND: The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently. RESULTS: In this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods. CONCLUSION: PREPRINT performs favorably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies. BioMed Central 2020-07-20 /pmc/articles/PMC7370432/ /pubmed/32689977 http://dx.doi.org/10.1186/s12859-020-03621-3 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Osmala, Maria
Lähdesmäki, Harri
Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title_full Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title_fullStr Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title_full_unstemmed Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title_short Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title_sort enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7370432/
https://www.ncbi.nlm.nih.gov/pubmed/32689977
http://dx.doi.org/10.1186/s12859-020-03621-3
work_keys_str_mv AT osmalamaria enhancerpredictioninthehumangenomebyprobabilisticmodellingofthechromatinfeaturepatterns
AT lahdesmakiharri enhancerpredictioninthehumangenomebyprobabilisticmodellingofthechromatinfeaturepatterns