Cargando…

Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications

BACKGROUND: The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, met...

Descripción completa

Detalles Bibliográficos
Autores principales: Hoang, Stephen A, Xu, Xiaojiang, Bekiranov, Stefan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3170335/
https://www.ncbi.nlm.nih.gov/pubmed/21834981
http://dx.doi.org/10.1186/1756-0500-4-288
_version_ 1782211614811881472
author Hoang, Stephen A
Xu, Xiaojiang
Bekiranov, Stefan
author_facet Hoang, Stephen A
Xu, Xiaojiang
Bekiranov, Stefan
author_sort Hoang, Stephen A
collection PubMed
description BACKGROUND: The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, methods for estimating enrichment levels in ChIP-seq data for these computational studies are understudied and variable. Since the conclusions drawn from these data mining and machine learning applications strongly depend on the enrichment level inputs, a comparison of estimation methods with respect to the performance of statistical models should be made. RESULTS: Various methods were used to estimate the gene-wise ChIP-seq enrichment levels for 20 histone methylations and the histone variant H2A.Z. The Multivariate Adaptive Regression Splines (MARS) algorithm was applied for each estimation method using the estimation of enrichment levels as predictors and gene expression levels as responses. The methods used to estimate enrichment levels included tag counting and model-based methods that were applied to whole genes and specific gene regions. These methods were also applied to various sizes of estimation windows. The MARS model performance was assessed with the Generalized Cross-Validation Score (GCV). We determined that model-based methods of enrichment estimation that spatially weight enrichment based on average patterns provided an improvement over tag counting methods. Also, methods that included information across the entire gene body provided improvement over methods that focus on a specific sub-region of the gene (e.g., the 5' or 3' region). CONCLUSION: The performance of data mining and machine learning methods when applied to histone modification ChIP-seq data can be improved by using data across the entire gene body, and incorporating the spatial distribution of enrichment. Refinement of enrichment estimation ultimately improved accuracy of model predictions.
format Online
Article
Text
id pubmed-3170335
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31703352011-09-10 Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications Hoang, Stephen A Xu, Xiaojiang Bekiranov, Stefan BMC Res Notes Research Article BACKGROUND: The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, methods for estimating enrichment levels in ChIP-seq data for these computational studies are understudied and variable. Since the conclusions drawn from these data mining and machine learning applications strongly depend on the enrichment level inputs, a comparison of estimation methods with respect to the performance of statistical models should be made. RESULTS: Various methods were used to estimate the gene-wise ChIP-seq enrichment levels for 20 histone methylations and the histone variant H2A.Z. The Multivariate Adaptive Regression Splines (MARS) algorithm was applied for each estimation method using the estimation of enrichment levels as predictors and gene expression levels as responses. The methods used to estimate enrichment levels included tag counting and model-based methods that were applied to whole genes and specific gene regions. These methods were also applied to various sizes of estimation windows. The MARS model performance was assessed with the Generalized Cross-Validation Score (GCV). We determined that model-based methods of enrichment estimation that spatially weight enrichment based on average patterns provided an improvement over tag counting methods. Also, methods that included information across the entire gene body provided improvement over methods that focus on a specific sub-region of the gene (e.g., the 5' or 3' region). CONCLUSION: The performance of data mining and machine learning methods when applied to histone modification ChIP-seq data can be improved by using data across the entire gene body, and incorporating the spatial distribution of enrichment. Refinement of enrichment estimation ultimately improved accuracy of model predictions. BioMed Central 2011-08-11 /pmc/articles/PMC3170335/ /pubmed/21834981 http://dx.doi.org/10.1186/1756-0500-4-288 Text en Copyright ©2011 Bekiranov et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Hoang, Stephen A
Xu, Xiaojiang
Bekiranov, Stefan
Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications
title Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications
title_full Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications
title_fullStr Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications
title_full_unstemmed Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications
title_short Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications
title_sort quantification of histone modification chip-seq enrichment for data mining and machine learning applications
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3170335/
https://www.ncbi.nlm.nih.gov/pubmed/21834981
http://dx.doi.org/10.1186/1756-0500-4-288
work_keys_str_mv AT hoangstephena quantificationofhistonemodificationchipseqenrichmentfordataminingandmachinelearningapplications
AT xuxiaojiang quantificationofhistonemodificationchipseqenrichmentfordataminingandmachinelearningapplications
AT bekiranovstefan quantificationofhistonemodificationchipseqenrichmentfordataminingandmachinelearningapplications