Cargando…

Direct prediction of regulatory elements from partial data without imputation

Genome segmentation approaches allow us to characterize regulatory states in a given cell type using combinatorial patterns of histone modifications and other regulatory signals. In order to analyze regulatory state differences across cell types, current genome segmentation approaches typically requ...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yu, Mahony, Shaun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6855516/
https://www.ncbi.nlm.nih.gov/pubmed/31682602
http://dx.doi.org/10.1371/journal.pcbi.1007399
_version_ 1783470416667344896
author Zhang, Yu
Mahony, Shaun
author_facet Zhang, Yu
Mahony, Shaun
author_sort Zhang, Yu
collection PubMed
description Genome segmentation approaches allow us to characterize regulatory states in a given cell type using combinatorial patterns of histone modifications and other regulatory signals. In order to analyze regulatory state differences across cell types, current genome segmentation approaches typically require that the same regulatory genomics assays have been performed in all analyzed cell types. This necessarily limits both the numbers of cell types that can be analyzed and the complexity of the resulting regulatory states, as only a small number of histone modifications have been profiled across many cell types. Data imputation approaches that aim to estimate missing regulatory signals have been applied before genome segmentation. However, this approach is computationally costly and propagates any errors in imputation to produce incorrect genome segmentation results downstream. We present an extension to the IDEAS genome segmentation platform which can perform genome segmentation on incomplete regulatory genomics dataset collections without using imputation. Instead of relying on imputed data, we use an expectation-maximization approach to estimate marginal density functions within each regulatory state. We demonstrate that our genome segmentation results compare favorably with approaches based on imputation or other strategies for handling missing data. We further show that our approach can accurately impute missing data after genome segmentation, reversing the typical order of imputation/genome segmentation pipelines. Finally, we present a new 2D genome segmentation analysis of 127 human cell types studied by the Roadmap Epigenomics Consortium. By using an expanded set of chromatin marks that have been profiled in subsets of these cell types, our new segmentation results capture a more complex picture of combinatorial regulatory patterns that appear on the human genome.
format Online
Article
Text
id pubmed-6855516
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-68555162019-12-06 Direct prediction of regulatory elements from partial data without imputation Zhang, Yu Mahony, Shaun PLoS Comput Biol Research Article Genome segmentation approaches allow us to characterize regulatory states in a given cell type using combinatorial patterns of histone modifications and other regulatory signals. In order to analyze regulatory state differences across cell types, current genome segmentation approaches typically require that the same regulatory genomics assays have been performed in all analyzed cell types. This necessarily limits both the numbers of cell types that can be analyzed and the complexity of the resulting regulatory states, as only a small number of histone modifications have been profiled across many cell types. Data imputation approaches that aim to estimate missing regulatory signals have been applied before genome segmentation. However, this approach is computationally costly and propagates any errors in imputation to produce incorrect genome segmentation results downstream. We present an extension to the IDEAS genome segmentation platform which can perform genome segmentation on incomplete regulatory genomics dataset collections without using imputation. Instead of relying on imputed data, we use an expectation-maximization approach to estimate marginal density functions within each regulatory state. We demonstrate that our genome segmentation results compare favorably with approaches based on imputation or other strategies for handling missing data. We further show that our approach can accurately impute missing data after genome segmentation, reversing the typical order of imputation/genome segmentation pipelines. Finally, we present a new 2D genome segmentation analysis of 127 human cell types studied by the Roadmap Epigenomics Consortium. By using an expanded set of chromatin marks that have been profiled in subsets of these cell types, our new segmentation results capture a more complex picture of combinatorial regulatory patterns that appear on the human genome. Public Library of Science 2019-11-04 /pmc/articles/PMC6855516/ /pubmed/31682602 http://dx.doi.org/10.1371/journal.pcbi.1007399 Text en © 2019 Zhang, Mahony http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhang, Yu
Mahony, Shaun
Direct prediction of regulatory elements from partial data without imputation
title Direct prediction of regulatory elements from partial data without imputation
title_full Direct prediction of regulatory elements from partial data without imputation
title_fullStr Direct prediction of regulatory elements from partial data without imputation
title_full_unstemmed Direct prediction of regulatory elements from partial data without imputation
title_short Direct prediction of regulatory elements from partial data without imputation
title_sort direct prediction of regulatory elements from partial data without imputation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6855516/
https://www.ncbi.nlm.nih.gov/pubmed/31682602
http://dx.doi.org/10.1371/journal.pcbi.1007399
work_keys_str_mv AT zhangyu directpredictionofregulatoryelementsfrompartialdatawithoutimputation
AT mahonyshaun directpredictionofregulatoryelementsfrompartialdatawithoutimputation