Cargando…
Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information
BACKGROUND: Whole genome sequencing of bisulfite converted DNA (‘methylC-seq’) method provides comprehensive information of DNA methylation. An important application of these whole genome methylation maps is classifying each position as a methylated versus non-methylated nucleotide. A widely used cu...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4117951/ https://www.ncbi.nlm.nih.gov/pubmed/25037738 http://dx.doi.org/10.1186/1471-2164-15-608 |
_version_ | 1782328764276932608 |
---|---|
author | Huh, Iksoo Yang, Xingyu Park, Taesung Yi, Soojin V |
author_facet | Huh, Iksoo Yang, Xingyu Park, Taesung Yi, Soojin V |
author_sort | Huh, Iksoo |
collection | PubMed |
description | BACKGROUND: Whole genome sequencing of bisulfite converted DNA (‘methylC-seq’) method provides comprehensive information of DNA methylation. An important application of these whole genome methylation maps is classifying each position as a methylated versus non-methylated nucleotide. A widely used current method for this purpose, the so-called binomial method, is intuitive and straightforward, but lacks power when the sequence coverage and the genome-wide methylation level are low. These problems present a particular challenge when analyzing sparsely methylated genomes, such as those of many invertebrates and plants. RESULTS: We demonstrate that the number of sequence reads per position from methylC-seq data displays a large variance and can be modeled as a shifted negative binomial distribution. We also show that DNA methylation levels of adjacent CpG sites are correlated, and this similarity in local DNA methylation levels extends several kilobases. Taking these observations into account, we propose a new method based on Bayesian classification to infer DNA methylation status while considering the neighborhood DNA methylation levels of a specific site. We show that our approach has higher sensitivity and better classification performance than the binomial method via multiple analyses, including computational simulations, Area Under Curve (AUC) analyses, and improved consistencies across biological replicates. This method is especially advantageous in the analyses of sparsely methylated genomes with low coverage. CONCLUSIONS: Our method improves the existing binomial method for binary methylation calls by utilizing a posterior odds framework and incorporating local methylation information. This method should be widely applicable to the analyses of methylC-seq data from diverse sparsely methylated genomes. Bis-Class and example data are provided at a dedicated website (http://bibs.snu.ac.kr/software/Bisclass). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-608) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4117951 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-41179512014-08-05 Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information Huh, Iksoo Yang, Xingyu Park, Taesung Yi, Soojin V BMC Genomics Methodology Article BACKGROUND: Whole genome sequencing of bisulfite converted DNA (‘methylC-seq’) method provides comprehensive information of DNA methylation. An important application of these whole genome methylation maps is classifying each position as a methylated versus non-methylated nucleotide. A widely used current method for this purpose, the so-called binomial method, is intuitive and straightforward, but lacks power when the sequence coverage and the genome-wide methylation level are low. These problems present a particular challenge when analyzing sparsely methylated genomes, such as those of many invertebrates and plants. RESULTS: We demonstrate that the number of sequence reads per position from methylC-seq data displays a large variance and can be modeled as a shifted negative binomial distribution. We also show that DNA methylation levels of adjacent CpG sites are correlated, and this similarity in local DNA methylation levels extends several kilobases. Taking these observations into account, we propose a new method based on Bayesian classification to infer DNA methylation status while considering the neighborhood DNA methylation levels of a specific site. We show that our approach has higher sensitivity and better classification performance than the binomial method via multiple analyses, including computational simulations, Area Under Curve (AUC) analyses, and improved consistencies across biological replicates. This method is especially advantageous in the analyses of sparsely methylated genomes with low coverage. CONCLUSIONS: Our method improves the existing binomial method for binary methylation calls by utilizing a posterior odds framework and incorporating local methylation information. This method should be widely applicable to the analyses of methylC-seq data from diverse sparsely methylated genomes. Bis-Class and example data are provided at a dedicated website (http://bibs.snu.ac.kr/software/Bisclass). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-608) contains supplementary material, which is available to authorized users. BioMed Central 2014-07-18 /pmc/articles/PMC4117951/ /pubmed/25037738 http://dx.doi.org/10.1186/1471-2164-15-608 Text en © Huh et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Huh, Iksoo Yang, Xingyu Park, Taesung Yi, Soojin V Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information |
title | Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information |
title_full | Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information |
title_fullStr | Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information |
title_full_unstemmed | Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information |
title_short | Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information |
title_sort | bis-class: a new classification tool of methylation status using bayes classifier and local methylation information |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4117951/ https://www.ncbi.nlm.nih.gov/pubmed/25037738 http://dx.doi.org/10.1186/1471-2164-15-608 |
work_keys_str_mv | AT huhiksoo bisclassanewclassificationtoolofmethylationstatususingbayesclassifierandlocalmethylationinformation AT yangxingyu bisclassanewclassificationtoolofmethylationstatususingbayesclassifierandlocalmethylationinformation AT parktaesung bisclassanewclassificationtoolofmethylationstatususingbayesclassifierandlocalmethylationinformation AT yisoojinv bisclassanewclassificationtoolofmethylationstatususingbayesclassifierandlocalmethylationinformation |