Cargando…

Universal annotation of the human genome through integration of over a thousand epigenomic datasets

BACKGROUND: Genome-wide maps of chromatin marks such as histone modifications and open chromatin sites provide valuable information for annotating the non-coding genome, including identifying regulatory elements. Computational approaches such as ChromHMM have been applied to discover and annotate ch...

Descripción completa

Detalles Bibliográficos
Autores principales:	Vu, Ha, Ernst, Jason
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8734071/ https://www.ncbi.nlm.nih.gov/pubmed/34991667 http://dx.doi.org/10.1186/s13059-021-02572-z

_version_	1784627936950747136
author	Vu, Ha Ernst, Jason
author_facet	Vu, Ha Ernst, Jason
author_sort	Vu, Ha
collection	PubMed
description	BACKGROUND: Genome-wide maps of chromatin marks such as histone modifications and open chromatin sites provide valuable information for annotating the non-coding genome, including identifying regulatory elements. Computational approaches such as ChromHMM have been applied to discover and annotate chromatin states defined by combinatorial and spatial patterns of chromatin marks within the same cell type. An alternative “stacked modeling” approach was previously suggested, where chromatin states are defined jointly from datasets of multiple cell types to produce a single universal genome annotation based on all datasets. Despite its potential benefits for applications that are not specific to one cell type, such an approach was previously applied only for small-scale specialized purposes. Large-scale applications of stacked modeling have previously posed scalability challenges. RESULTS: Using a version of ChromHMM enhanced for large-scale applications, we apply the stacked modeling approach to produce a universal chromatin state annotation of the human genome using over 1000 datasets from more than 100 cell types, with the learned model denoted as the full-stack model. The full-stack model states show distinct enrichments for external genomic annotations, which we use in characterizing each state. Compared to per-cell-type annotations, the full-stack annotations directly differentiate constitutive from cell type-specific activity and is more predictive of locations of external genomic annotations. CONCLUSIONS: The full-stack ChromHMM model provides a universal chromatin state annotation of the genome and a unified global view of over 1000 datasets. We expect this to be a useful resource that complements existing per-cell-type annotations for studying the non-coding human genome. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-021-02572-z.
format	Online Article Text
id	pubmed-8734071
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-87340712022-01-07 Universal annotation of the human genome through integration of over a thousand epigenomic datasets Vu, Ha Ernst, Jason Genome Biol Research BACKGROUND: Genome-wide maps of chromatin marks such as histone modifications and open chromatin sites provide valuable information for annotating the non-coding genome, including identifying regulatory elements. Computational approaches such as ChromHMM have been applied to discover and annotate chromatin states defined by combinatorial and spatial patterns of chromatin marks within the same cell type. An alternative “stacked modeling” approach was previously suggested, where chromatin states are defined jointly from datasets of multiple cell types to produce a single universal genome annotation based on all datasets. Despite its potential benefits for applications that are not specific to one cell type, such an approach was previously applied only for small-scale specialized purposes. Large-scale applications of stacked modeling have previously posed scalability challenges. RESULTS: Using a version of ChromHMM enhanced for large-scale applications, we apply the stacked modeling approach to produce a universal chromatin state annotation of the human genome using over 1000 datasets from more than 100 cell types, with the learned model denoted as the full-stack model. The full-stack model states show distinct enrichments for external genomic annotations, which we use in characterizing each state. Compared to per-cell-type annotations, the full-stack annotations directly differentiate constitutive from cell type-specific activity and is more predictive of locations of external genomic annotations. CONCLUSIONS: The full-stack ChromHMM model provides a universal chromatin state annotation of the genome and a unified global view of over 1000 datasets. We expect this to be a useful resource that complements existing per-cell-type annotations for studying the non-coding human genome. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-021-02572-z. BioMed Central 2022-01-06 /pmc/articles/PMC8734071/ /pubmed/34991667 http://dx.doi.org/10.1186/s13059-021-02572-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Vu, Ha Ernst, Jason Universal annotation of the human genome through integration of over a thousand epigenomic datasets
title	Universal annotation of the human genome through integration of over a thousand epigenomic datasets
title_full	Universal annotation of the human genome through integration of over a thousand epigenomic datasets
title_fullStr	Universal annotation of the human genome through integration of over a thousand epigenomic datasets
title_full_unstemmed	Universal annotation of the human genome through integration of over a thousand epigenomic datasets
title_short	Universal annotation of the human genome through integration of over a thousand epigenomic datasets
title_sort	universal annotation of the human genome through integration of over a thousand epigenomic datasets
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8734071/ https://www.ncbi.nlm.nih.gov/pubmed/34991667 http://dx.doi.org/10.1186/s13059-021-02572-z
work_keys_str_mv	AT vuha universalannotationofthehumangenomethroughintegrationofoverathousandepigenomicdatasets AT ernstjason universalannotationofthehumangenomethroughintegrationofoverathousandepigenomicdatasets

Universal annotation of the human genome through integration of over a thousand epigenomic datasets

Ejemplares similares