Cargando…

A novel k-mer mixture logistic regression for methylation susceptibility modeling of CpG dinucleotides in human gene promoters

BACKGROUND: DNA methylation is essential for normal development and differentiation and plays a crucial role in the development of nearly all types of cancer. Aberrant DNA methylation patterns, including genome-wide hypomethylation and region-specific hypermethylation, are frequently observed and co...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Youngik, Nephew, Kenneth, Kim, Sun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3311103/
https://www.ncbi.nlm.nih.gov/pubmed/22536899
http://dx.doi.org/10.1186/1471-2105-13-S3-S15
_version_ 1782227747332947968
author Yang, Youngik
Nephew, Kenneth
Kim, Sun
author_facet Yang, Youngik
Nephew, Kenneth
Kim, Sun
author_sort Yang, Youngik
collection PubMed
description BACKGROUND: DNA methylation is essential for normal development and differentiation and plays a crucial role in the development of nearly all types of cancer. Aberrant DNA methylation patterns, including genome-wide hypomethylation and region-specific hypermethylation, are frequently observed and contribute to the malignant phenotype. A number of studies have recently identified distinct features of genomic sequences that can be used for modeling specific DNA sequences that may be susceptible to aberrant CpG methylation in both cancer and normal cells. Although it is now possible, using next generation sequencing technologies, to assess human methylomes at base resolution, no reports currently exist on modeling cell type-specific DNA methylation susceptibility. Thus, we conducted a comprehensive modeling study of cell type-specific DNA methylation susceptibility at three different resolutions: CpG dinucleotides, CpG segments, and individual gene promoter regions. RESULTS: Using a k-mer mixture logistic regression model, we effectively modeled DNA methylation susceptibility across five different cell types. Further, at the segment level, we achieved up to 0.75 in AUC prediction accuracy in a 10-fold cross validation study using a mixture of k-mers. CONCLUSIONS: The significance of these results is three fold: 1) this is the first report to indicate that CpG methylation susceptible "segments" exist; 2) our model demonstrates the significance of certain k-mers for the mixture model, potentially highlighting DNA sequence features (k-mers) of differentially methylated, promoter CpG island sequences across different tissue types; 3) as only 3 or 4 bp patterns had previously been used for modeling DNA methylation susceptibility, ours is the first demonstration that 6-mer modeling can be performed without loss of accuracy.
format Online
Article
Text
id pubmed-3311103
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33111032012-04-02 A novel k-mer mixture logistic regression for methylation susceptibility modeling of CpG dinucleotides in human gene promoters Yang, Youngik Nephew, Kenneth Kim, Sun BMC Bioinformatics Proceedings BACKGROUND: DNA methylation is essential for normal development and differentiation and plays a crucial role in the development of nearly all types of cancer. Aberrant DNA methylation patterns, including genome-wide hypomethylation and region-specific hypermethylation, are frequently observed and contribute to the malignant phenotype. A number of studies have recently identified distinct features of genomic sequences that can be used for modeling specific DNA sequences that may be susceptible to aberrant CpG methylation in both cancer and normal cells. Although it is now possible, using next generation sequencing technologies, to assess human methylomes at base resolution, no reports currently exist on modeling cell type-specific DNA methylation susceptibility. Thus, we conducted a comprehensive modeling study of cell type-specific DNA methylation susceptibility at three different resolutions: CpG dinucleotides, CpG segments, and individual gene promoter regions. RESULTS: Using a k-mer mixture logistic regression model, we effectively modeled DNA methylation susceptibility across five different cell types. Further, at the segment level, we achieved up to 0.75 in AUC prediction accuracy in a 10-fold cross validation study using a mixture of k-mers. CONCLUSIONS: The significance of these results is three fold: 1) this is the first report to indicate that CpG methylation susceptible "segments" exist; 2) our model demonstrates the significance of certain k-mers for the mixture model, potentially highlighting DNA sequence features (k-mers) of differentially methylated, promoter CpG island sequences across different tissue types; 3) as only 3 or 4 bp patterns had previously been used for modeling DNA methylation susceptibility, ours is the first demonstration that 6-mer modeling can be performed without loss of accuracy. BioMed Central 2012-03-21 /pmc/articles/PMC3311103/ /pubmed/22536899 http://dx.doi.org/10.1186/1471-2105-13-S3-S15 Text en Copyright ©2012 Yang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Yang, Youngik
Nephew, Kenneth
Kim, Sun
A novel k-mer mixture logistic regression for methylation susceptibility modeling of CpG dinucleotides in human gene promoters
title A novel k-mer mixture logistic regression for methylation susceptibility modeling of CpG dinucleotides in human gene promoters
title_full A novel k-mer mixture logistic regression for methylation susceptibility modeling of CpG dinucleotides in human gene promoters
title_fullStr A novel k-mer mixture logistic regression for methylation susceptibility modeling of CpG dinucleotides in human gene promoters
title_full_unstemmed A novel k-mer mixture logistic regression for methylation susceptibility modeling of CpG dinucleotides in human gene promoters
title_short A novel k-mer mixture logistic regression for methylation susceptibility modeling of CpG dinucleotides in human gene promoters
title_sort novel k-mer mixture logistic regression for methylation susceptibility modeling of cpg dinucleotides in human gene promoters
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3311103/
https://www.ncbi.nlm.nih.gov/pubmed/22536899
http://dx.doi.org/10.1186/1471-2105-13-S3-S15
work_keys_str_mv AT yangyoungik anovelkmermixturelogisticregressionformethylationsusceptibilitymodelingofcpgdinucleotidesinhumangenepromoters
AT nephewkenneth anovelkmermixturelogisticregressionformethylationsusceptibilitymodelingofcpgdinucleotidesinhumangenepromoters
AT kimsun anovelkmermixturelogisticregressionformethylationsusceptibilitymodelingofcpgdinucleotidesinhumangenepromoters
AT yangyoungik novelkmermixturelogisticregressionformethylationsusceptibilitymodelingofcpgdinucleotidesinhumangenepromoters
AT nephewkenneth novelkmermixturelogisticregressionformethylationsusceptibilitymodelingofcpgdinucleotidesinhumangenepromoters
AT kimsun novelkmermixturelogisticregressionformethylationsusceptibilitymodelingofcpgdinucleotidesinhumangenepromoters