Cargando…

Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns

BACKGROUND: The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Osmala, Maria, Lähdesmäki, Harri
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7370432/ https://www.ncbi.nlm.nih.gov/pubmed/32689977 http://dx.doi.org/10.1186/s12859-020-03621-3

_version_	1783560975804268544
author	Osmala, Maria Lähdesmäki, Harri
author_facet	Osmala, Maria Lähdesmäki, Harri
author_sort	Osmala, Maria
collection	PubMed
description	BACKGROUND: The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently. RESULTS: In this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods. CONCLUSION: PREPRINT performs favorably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies.
format	Online Article Text
id	pubmed-7370432
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-73704322020-07-21 Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns Osmala, Maria Lähdesmäki, Harri BMC Bioinformatics Methodology Article BACKGROUND: The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently. RESULTS: In this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods. CONCLUSION: PREPRINT performs favorably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies. BioMed Central 2020-07-20 /pmc/articles/PMC7370432/ /pubmed/32689977 http://dx.doi.org/10.1186/s12859-020-03621-3 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Article Osmala, Maria Lähdesmäki, Harri Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title	Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title_full	Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title_fullStr	Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title_full_unstemmed	Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title_short	Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
title_sort	enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7370432/ https://www.ncbi.nlm.nih.gov/pubmed/32689977 http://dx.doi.org/10.1186/s12859-020-03621-3
work_keys_str_mv	AT osmalamaria enhancerpredictioninthehumangenomebyprobabilisticmodellingofthechromatinfeaturepatterns AT lahdesmakiharri enhancerpredictioninthehumangenomebyprobabilisticmodellingofthechromatinfeaturepatterns

Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns

Ejemplares similares