Cargando…
GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences
BACKGROUND: As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5461559/ https://www.ncbi.nlm.nih.gov/pubmed/28589860 http://dx.doi.org/10.1186/s12864-017-3731-5 |
_version_ | 1783242356243300352 |
---|---|
author | Yu, Ning Guo, Xuan Zelikovsky, Alexander Pan, Yi |
author_facet | Yu, Ning Guo, Xuan Zelikovsky, Alexander Pan, Yi |
author_sort | Yu, Ning |
collection | PubMed |
description | BACKGROUND: As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on: (a) %G+C content is ≥ 50%, (b) the ratio of the observed CpG content and the expected CpG content is ≥ 0.6, and (c) the general length of CGI is greater than 200 nucleotides. Most existing computational methods for the prediction of CpG island are programmed on these rules. However, many experimentally verified CpG islands deviate from these artificial criteria. Experiments indicate that in many cases %G+C is < 50%, CpG (obs)/CpG (exp) varies, and the length of CGI ranges from eight nucleotides to a few thousand of nucleotides. It implies that CGI detection is not just a straightly statistical task and some unrevealed rules probably are hidden. RESULTS: A novel Gaussian model, GaussianCpG, is developed for detection of CpG islands on human genome. We analyze the energy distribution over genomic primary structure for each CpG site and adopt the parameters from statistics of Human genome. The evaluation results show that the new model can predict CpG islands efficiently by balancing both sensitivity and specificity over known human CGI data sets. Compared with other models, GaussianCpG can achieve better performance in CGI detection. CONCLUSIONS: Our Gaussian model aims to simplify the complex interaction between nucleotides. The model is computed not by the linear statistical method but by the Gaussian energy distribution and accumulation. The parameters of Gaussian function are not arbitrarily designated but deliberately chosen by optimizing the biological statistics. By using the pseudopotential analysis on CpG islands, the novel model is validated on both the real and artificial data sets. |
format | Online Article Text |
id | pubmed-5461559 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-54615592017-06-07 GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences Yu, Ning Guo, Xuan Zelikovsky, Alexander Pan, Yi BMC Genomics Research BACKGROUND: As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on: (a) %G+C content is ≥ 50%, (b) the ratio of the observed CpG content and the expected CpG content is ≥ 0.6, and (c) the general length of CGI is greater than 200 nucleotides. Most existing computational methods for the prediction of CpG island are programmed on these rules. However, many experimentally verified CpG islands deviate from these artificial criteria. Experiments indicate that in many cases %G+C is < 50%, CpG (obs)/CpG (exp) varies, and the length of CGI ranges from eight nucleotides to a few thousand of nucleotides. It implies that CGI detection is not just a straightly statistical task and some unrevealed rules probably are hidden. RESULTS: A novel Gaussian model, GaussianCpG, is developed for detection of CpG islands on human genome. We analyze the energy distribution over genomic primary structure for each CpG site and adopt the parameters from statistics of Human genome. The evaluation results show that the new model can predict CpG islands efficiently by balancing both sensitivity and specificity over known human CGI data sets. Compared with other models, GaussianCpG can achieve better performance in CGI detection. CONCLUSIONS: Our Gaussian model aims to simplify the complex interaction between nucleotides. The model is computed not by the linear statistical method but by the Gaussian energy distribution and accumulation. The parameters of Gaussian function are not arbitrarily designated but deliberately chosen by optimizing the biological statistics. By using the pseudopotential analysis on CpG islands, the novel model is validated on both the real and artificial data sets. BioMed Central 2017-05-24 /pmc/articles/PMC5461559/ /pubmed/28589860 http://dx.doi.org/10.1186/s12864-017-3731-5 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Yu, Ning Guo, Xuan Zelikovsky, Alexander Pan, Yi GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences |
title | GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences |
title_full | GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences |
title_fullStr | GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences |
title_full_unstemmed | GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences |
title_short | GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences |
title_sort | gaussiancpg: a gaussian model for detection of cpg island in human genome sequences |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5461559/ https://www.ncbi.nlm.nih.gov/pubmed/28589860 http://dx.doi.org/10.1186/s12864-017-3731-5 |
work_keys_str_mv | AT yuning gaussiancpgagaussianmodelfordetectionofcpgislandinhumangenomesequences AT guoxuan gaussiancpgagaussianmodelfordetectionofcpgislandinhumangenomesequences AT zelikovskyalexander gaussiancpgagaussianmodelfordetectionofcpgislandinhumangenomesequences AT panyi gaussiancpgagaussianmodelfordetectionofcpgislandinhumangenomesequences |