Cargando…
Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications
BACKGROUND: The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, met...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3170335/ https://www.ncbi.nlm.nih.gov/pubmed/21834981 http://dx.doi.org/10.1186/1756-0500-4-288 |
_version_ | 1782211614811881472 |
---|---|
author | Hoang, Stephen A Xu, Xiaojiang Bekiranov, Stefan |
author_facet | Hoang, Stephen A Xu, Xiaojiang Bekiranov, Stefan |
author_sort | Hoang, Stephen A |
collection | PubMed |
description | BACKGROUND: The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, methods for estimating enrichment levels in ChIP-seq data for these computational studies are understudied and variable. Since the conclusions drawn from these data mining and machine learning applications strongly depend on the enrichment level inputs, a comparison of estimation methods with respect to the performance of statistical models should be made. RESULTS: Various methods were used to estimate the gene-wise ChIP-seq enrichment levels for 20 histone methylations and the histone variant H2A.Z. The Multivariate Adaptive Regression Splines (MARS) algorithm was applied for each estimation method using the estimation of enrichment levels as predictors and gene expression levels as responses. The methods used to estimate enrichment levels included tag counting and model-based methods that were applied to whole genes and specific gene regions. These methods were also applied to various sizes of estimation windows. The MARS model performance was assessed with the Generalized Cross-Validation Score (GCV). We determined that model-based methods of enrichment estimation that spatially weight enrichment based on average patterns provided an improvement over tag counting methods. Also, methods that included information across the entire gene body provided improvement over methods that focus on a specific sub-region of the gene (e.g., the 5' or 3' region). CONCLUSION: The performance of data mining and machine learning methods when applied to histone modification ChIP-seq data can be improved by using data across the entire gene body, and incorporating the spatial distribution of enrichment. Refinement of enrichment estimation ultimately improved accuracy of model predictions. |
format | Online Article Text |
id | pubmed-3170335 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-31703352011-09-10 Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications Hoang, Stephen A Xu, Xiaojiang Bekiranov, Stefan BMC Res Notes Research Article BACKGROUND: The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, methods for estimating enrichment levels in ChIP-seq data for these computational studies are understudied and variable. Since the conclusions drawn from these data mining and machine learning applications strongly depend on the enrichment level inputs, a comparison of estimation methods with respect to the performance of statistical models should be made. RESULTS: Various methods were used to estimate the gene-wise ChIP-seq enrichment levels for 20 histone methylations and the histone variant H2A.Z. The Multivariate Adaptive Regression Splines (MARS) algorithm was applied for each estimation method using the estimation of enrichment levels as predictors and gene expression levels as responses. The methods used to estimate enrichment levels included tag counting and model-based methods that were applied to whole genes and specific gene regions. These methods were also applied to various sizes of estimation windows. The MARS model performance was assessed with the Generalized Cross-Validation Score (GCV). We determined that model-based methods of enrichment estimation that spatially weight enrichment based on average patterns provided an improvement over tag counting methods. Also, methods that included information across the entire gene body provided improvement over methods that focus on a specific sub-region of the gene (e.g., the 5' or 3' region). CONCLUSION: The performance of data mining and machine learning methods when applied to histone modification ChIP-seq data can be improved by using data across the entire gene body, and incorporating the spatial distribution of enrichment. Refinement of enrichment estimation ultimately improved accuracy of model predictions. BioMed Central 2011-08-11 /pmc/articles/PMC3170335/ /pubmed/21834981 http://dx.doi.org/10.1186/1756-0500-4-288 Text en Copyright ©2011 Bekiranov et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Hoang, Stephen A Xu, Xiaojiang Bekiranov, Stefan Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications |
title | Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications |
title_full | Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications |
title_fullStr | Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications |
title_full_unstemmed | Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications |
title_short | Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications |
title_sort | quantification of histone modification chip-seq enrichment for data mining and machine learning applications |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3170335/ https://www.ncbi.nlm.nih.gov/pubmed/21834981 http://dx.doi.org/10.1186/1756-0500-4-288 |
work_keys_str_mv | AT hoangstephena quantificationofhistonemodificationchipseqenrichmentfordataminingandmachinelearningapplications AT xuxiaojiang quantificationofhistonemodificationchipseqenrichmentfordataminingandmachinelearningapplications AT bekiranovstefan quantificationofhistonemodificationchipseqenrichmentfordataminingandmachinelearningapplications |