Cargando…

Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome

This manuscript presents a comprehensive collection of diverse epigenomic profiling data for the human genome in 100-bp resolution with full genome-wide coverage. The datasets are processed from raw read count data collected from five types of sequencing-based assays collected by the Encyclopedia of...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Ronnie Y., Huang, Yanting, Zhao, Zhiyue, Qin, Zhaohui S.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2022
Materias:	Data Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9792340/ https://www.ncbi.nlm.nih.gov/pubmed/36582986 http://dx.doi.org/10.1016/j.dib.2022.108827

_version_	1784859614044487680
author	Li, Ronnie Y. Huang, Yanting Zhao, Zhiyue Qin, Zhaohui S.
author_facet	Li, Ronnie Y. Huang, Yanting Zhao, Zhiyue Qin, Zhaohui S.
author_sort	Li, Ronnie Y.
collection	PubMed
description	This manuscript presents a comprehensive collection of diverse epigenomic profiling data for the human genome in 100-bp resolution with full genome-wide coverage. The datasets are processed from raw read count data collected from five types of sequencing-based assays collected by the Encyclopedia of DNA Elements consortium (ENCODE, http://www.encodeproject.org). Data from high-throughput sequencing assays were processed and crystallized into a total of 6,305 genome-wide profiles. To ensure the quality of the features, we filtered out assays with low read depth, inconsistent read counts, and poor data quality. The types of sequencing-based experiment assays include DNase-seq, histone and TF ChIP-seq, ATAC-seq, and Poly(A) RNA-seq. Merging of processed data was done by averaging read counts across technical replicates to obtain signals in about 30 million predefined 100-bp bins that tile the entire genome. We provide an example of fetching read counts using disease-related risk variants from the GWAS Catalog. Additionally, we have created a tabix index enabling fast user retrieval of read counts given coordinates in the human genome. The data processing pipeline is replicable for users’ own purposes and for other experimental assays. The processed data can be found on Zenodo at https://zenodo.org/record/7015783. These data can be used as features for statistical and machine learning models to predict or infer a wide range of variables of biological interest. They can also be applied to generate novel insights into gene expression, chromatin accessibility, and epigenetic modifications across the human genome. Finally, the processing pipeline can be easily applied to data from any other genome-wide profiling assays, expanding the amount of available data.
format	Online Article Text
id	pubmed-9792340
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-97923402022-12-28 Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome Li, Ronnie Y. Huang, Yanting Zhao, Zhiyue Qin, Zhaohui S. Data Brief Data Article This manuscript presents a comprehensive collection of diverse epigenomic profiling data for the human genome in 100-bp resolution with full genome-wide coverage. The datasets are processed from raw read count data collected from five types of sequencing-based assays collected by the Encyclopedia of DNA Elements consortium (ENCODE, http://www.encodeproject.org). Data from high-throughput sequencing assays were processed and crystallized into a total of 6,305 genome-wide profiles. To ensure the quality of the features, we filtered out assays with low read depth, inconsistent read counts, and poor data quality. The types of sequencing-based experiment assays include DNase-seq, histone and TF ChIP-seq, ATAC-seq, and Poly(A) RNA-seq. Merging of processed data was done by averaging read counts across technical replicates to obtain signals in about 30 million predefined 100-bp bins that tile the entire genome. We provide an example of fetching read counts using disease-related risk variants from the GWAS Catalog. Additionally, we have created a tabix index enabling fast user retrieval of read counts given coordinates in the human genome. The data processing pipeline is replicable for users’ own purposes and for other experimental assays. The processed data can be found on Zenodo at https://zenodo.org/record/7015783. These data can be used as features for statistical and machine learning models to predict or infer a wide range of variables of biological interest. They can also be applied to generate novel insights into gene expression, chromatin accessibility, and epigenetic modifications across the human genome. Finally, the processing pipeline can be easily applied to data from any other genome-wide profiling assays, expanding the amount of available data. Elsevier 2022-12-14 /pmc/articles/PMC9792340/ /pubmed/36582986 http://dx.doi.org/10.1016/j.dib.2022.108827 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Data Article Li, Ronnie Y. Huang, Yanting Zhao, Zhiyue Qin, Zhaohui S. Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome
title	Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome
title_full	Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome
title_fullStr	Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome
title_full_unstemmed	Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome
title_short	Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome
title_sort	comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome
topic	Data Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9792340/ https://www.ncbi.nlm.nih.gov/pubmed/36582986 http://dx.doi.org/10.1016/j.dib.2022.108827
work_keys_str_mv	AT lironniey comprehensive100bpresolutiongenomewideepigenomicprofilingdataforthehg38humanreferencegenome AT huangyanting comprehensive100bpresolutiongenomewideepigenomicprofilingdataforthehg38humanreferencegenome AT zhaozhiyue comprehensive100bpresolutiongenomewideepigenomicprofilingdataforthehg38humanreferencegenome AT qinzhaohuis comprehensive100bpresolutiongenomewideepigenomicprofilingdataforthehg38humanreferencegenome

Comprehensive 100-bp resolution genome-wide epigenomic profiling data for the hg38 human reference genome

Ejemplares similares