Cargando…

GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences

BACKGROUND: As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Ning, Guo, Xuan, Zelikovsky, Alexander, Pan, Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5461559/
https://www.ncbi.nlm.nih.gov/pubmed/28589860
http://dx.doi.org/10.1186/s12864-017-3731-5
_version_ 1783242356243300352
author Yu, Ning
Guo, Xuan
Zelikovsky, Alexander
Pan, Yi
author_facet Yu, Ning
Guo, Xuan
Zelikovsky, Alexander
Pan, Yi
author_sort Yu, Ning
collection PubMed
description BACKGROUND: As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on: (a) %G+C content is ≥ 50%, (b) the ratio of the observed CpG content and the expected CpG content is ≥ 0.6, and (c) the general length of CGI is greater than 200 nucleotides. Most existing computational methods for the prediction of CpG island are programmed on these rules. However, many experimentally verified CpG islands deviate from these artificial criteria. Experiments indicate that in many cases %G+C is < 50%, CpG (obs)/CpG (exp) varies, and the length of CGI ranges from eight nucleotides to a few thousand of nucleotides. It implies that CGI detection is not just a straightly statistical task and some unrevealed rules probably are hidden. RESULTS: A novel Gaussian model, GaussianCpG, is developed for detection of CpG islands on human genome. We analyze the energy distribution over genomic primary structure for each CpG site and adopt the parameters from statistics of Human genome. The evaluation results show that the new model can predict CpG islands efficiently by balancing both sensitivity and specificity over known human CGI data sets. Compared with other models, GaussianCpG can achieve better performance in CGI detection. CONCLUSIONS: Our Gaussian model aims to simplify the complex interaction between nucleotides. The model is computed not by the linear statistical method but by the Gaussian energy distribution and accumulation. The parameters of Gaussian function are not arbitrarily designated but deliberately chosen by optimizing the biological statistics. By using the pseudopotential analysis on CpG islands, the novel model is validated on both the real and artificial data sets.
format Online
Article
Text
id pubmed-5461559
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54615592017-06-07 GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences Yu, Ning Guo, Xuan Zelikovsky, Alexander Pan, Yi BMC Genomics Research BACKGROUND: As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on: (a) %G+C content is ≥ 50%, (b) the ratio of the observed CpG content and the expected CpG content is ≥ 0.6, and (c) the general length of CGI is greater than 200 nucleotides. Most existing computational methods for the prediction of CpG island are programmed on these rules. However, many experimentally verified CpG islands deviate from these artificial criteria. Experiments indicate that in many cases %G+C is < 50%, CpG (obs)/CpG (exp) varies, and the length of CGI ranges from eight nucleotides to a few thousand of nucleotides. It implies that CGI detection is not just a straightly statistical task and some unrevealed rules probably are hidden. RESULTS: A novel Gaussian model, GaussianCpG, is developed for detection of CpG islands on human genome. We analyze the energy distribution over genomic primary structure for each CpG site and adopt the parameters from statistics of Human genome. The evaluation results show that the new model can predict CpG islands efficiently by balancing both sensitivity and specificity over known human CGI data sets. Compared with other models, GaussianCpG can achieve better performance in CGI detection. CONCLUSIONS: Our Gaussian model aims to simplify the complex interaction between nucleotides. The model is computed not by the linear statistical method but by the Gaussian energy distribution and accumulation. The parameters of Gaussian function are not arbitrarily designated but deliberately chosen by optimizing the biological statistics. By using the pseudopotential analysis on CpG islands, the novel model is validated on both the real and artificial data sets. BioMed Central 2017-05-24 /pmc/articles/PMC5461559/ /pubmed/28589860 http://dx.doi.org/10.1186/s12864-017-3731-5 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Yu, Ning
Guo, Xuan
Zelikovsky, Alexander
Pan, Yi
GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences
title GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences
title_full GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences
title_fullStr GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences
title_full_unstemmed GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences
title_short GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences
title_sort gaussiancpg: a gaussian model for detection of cpg island in human genome sequences
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5461559/
https://www.ncbi.nlm.nih.gov/pubmed/28589860
http://dx.doi.org/10.1186/s12864-017-3731-5
work_keys_str_mv AT yuning gaussiancpgagaussianmodelfordetectionofcpgislandinhumangenomesequences
AT guoxuan gaussiancpgagaussianmodelfordetectionofcpgislandinhumangenomesequences
AT zelikovskyalexander gaussiancpgagaussianmodelfordetectionofcpgislandinhumangenomesequences
AT panyi gaussiancpgagaussianmodelfordetectionofcpgislandinhumangenomesequences