Cargando…

Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications

BACKGROUND: The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, met...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hoang, Stephen A, Xu, Xiaojiang, Bekiranov, Stefan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3170335/ https://www.ncbi.nlm.nih.gov/pubmed/21834981 http://dx.doi.org/10.1186/1756-0500-4-288

_version_	1782211614811881472
author	Hoang, Stephen A Xu, Xiaojiang Bekiranov, Stefan
author_facet	Hoang, Stephen A Xu, Xiaojiang Bekiranov, Stefan
author_sort	Hoang, Stephen A
collection	PubMed
description	BACKGROUND: The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, methods for estimating enrichment levels in ChIP-seq data for these computational studies are understudied and variable. Since the conclusions drawn from these data mining and machine learning applications strongly depend on the enrichment level inputs, a comparison of estimation methods with respect to the performance of statistical models should be made. RESULTS: Various methods were used to estimate the gene-wise ChIP-seq enrichment levels for 20 histone methylations and the histone variant H2A.Z. The Multivariate Adaptive Regression Splines (MARS) algorithm was applied for each estimation method using the estimation of enrichment levels as predictors and gene expression levels as responses. The methods used to estimate enrichment levels included tag counting and model-based methods that were applied to whole genes and specific gene regions. These methods were also applied to various sizes of estimation windows. The MARS model performance was assessed with the Generalized Cross-Validation Score (GCV). We determined that model-based methods of enrichment estimation that spatially weight enrichment based on average patterns provided an improvement over tag counting methods. Also, methods that included information across the entire gene body provided improvement over methods that focus on a specific sub-region of the gene (e.g., the 5' or 3' region). CONCLUSION: The performance of data mining and machine learning methods when applied to histone modification ChIP-seq data can be improved by using data across the entire gene body, and incorporating the spatial distribution of enrichment. Refinement of enrichment estimation ultimately improved accuracy of model predictions.
format	Online Article Text
id	pubmed-3170335
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-31703352011-09-10 Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications Hoang, Stephen A Xu, Xiaojiang Bekiranov, Stefan BMC Res Notes Research Article BACKGROUND: The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, methods for estimating enrichment levels in ChIP-seq data for these computational studies are understudied and variable. Since the conclusions drawn from these data mining and machine learning applications strongly depend on the enrichment level inputs, a comparison of estimation methods with respect to the performance of statistical models should be made. RESULTS: Various methods were used to estimate the gene-wise ChIP-seq enrichment levels for 20 histone methylations and the histone variant H2A.Z. The Multivariate Adaptive Regression Splines (MARS) algorithm was applied for each estimation method using the estimation of enrichment levels as predictors and gene expression levels as responses. The methods used to estimate enrichment levels included tag counting and model-based methods that were applied to whole genes and specific gene regions. These methods were also applied to various sizes of estimation windows. The MARS model performance was assessed with the Generalized Cross-Validation Score (GCV). We determined that model-based methods of enrichment estimation that spatially weight enrichment based on average patterns provided an improvement over tag counting methods. Also, methods that included information across the entire gene body provided improvement over methods that focus on a specific sub-region of the gene (e.g., the 5' or 3' region). CONCLUSION: The performance of data mining and machine learning methods when applied to histone modification ChIP-seq data can be improved by using data across the entire gene body, and incorporating the spatial distribution of enrichment. Refinement of enrichment estimation ultimately improved accuracy of model predictions. BioMed Central 2011-08-11 /pmc/articles/PMC3170335/ /pubmed/21834981 http://dx.doi.org/10.1186/1756-0500-4-288 Text en Copyright ©2011 Bekiranov et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Hoang, Stephen A Xu, Xiaojiang Bekiranov, Stefan Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications
title	Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications
title_full	Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications
title_fullStr	Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications
title_full_unstemmed	Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications
title_short	Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications
title_sort	quantification of histone modification chip-seq enrichment for data mining and machine learning applications
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3170335/ https://www.ncbi.nlm.nih.gov/pubmed/21834981 http://dx.doi.org/10.1186/1756-0500-4-288
work_keys_str_mv	AT hoangstephena quantificationofhistonemodificationchipseqenrichmentfordataminingandmachinelearningapplications AT xuxiaojiang quantificationofhistonemodificationchipseqenrichmentfordataminingandmachinelearningapplications AT bekiranovstefan quantificationofhistonemodificationchipseqenrichmentfordataminingandmachinelearningapplications

Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications

Ejemplares similares