Cargando…

Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information

BACKGROUND: Whole genome sequencing of bisulfite converted DNA (‘methylC-seq’) method provides comprehensive information of DNA methylation. An important application of these whole genome methylation maps is classifying each position as a methylated versus non-methylated nucleotide. A widely used cu...

Descripción completa

Detalles Bibliográficos
Autores principales: Huh, Iksoo, Yang, Xingyu, Park, Taesung, Yi, Soojin V
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4117951/
https://www.ncbi.nlm.nih.gov/pubmed/25037738
http://dx.doi.org/10.1186/1471-2164-15-608
_version_ 1782328764276932608
author Huh, Iksoo
Yang, Xingyu
Park, Taesung
Yi, Soojin V
author_facet Huh, Iksoo
Yang, Xingyu
Park, Taesung
Yi, Soojin V
author_sort Huh, Iksoo
collection PubMed
description BACKGROUND: Whole genome sequencing of bisulfite converted DNA (‘methylC-seq’) method provides comprehensive information of DNA methylation. An important application of these whole genome methylation maps is classifying each position as a methylated versus non-methylated nucleotide. A widely used current method for this purpose, the so-called binomial method, is intuitive and straightforward, but lacks power when the sequence coverage and the genome-wide methylation level are low. These problems present a particular challenge when analyzing sparsely methylated genomes, such as those of many invertebrates and plants. RESULTS: We demonstrate that the number of sequence reads per position from methylC-seq data displays a large variance and can be modeled as a shifted negative binomial distribution. We also show that DNA methylation levels of adjacent CpG sites are correlated, and this similarity in local DNA methylation levels extends several kilobases. Taking these observations into account, we propose a new method based on Bayesian classification to infer DNA methylation status while considering the neighborhood DNA methylation levels of a specific site. We show that our approach has higher sensitivity and better classification performance than the binomial method via multiple analyses, including computational simulations, Area Under Curve (AUC) analyses, and improved consistencies across biological replicates. This method is especially advantageous in the analyses of sparsely methylated genomes with low coverage. CONCLUSIONS: Our method improves the existing binomial method for binary methylation calls by utilizing a posterior odds framework and incorporating local methylation information. This method should be widely applicable to the analyses of methylC-seq data from diverse sparsely methylated genomes. Bis-Class and example data are provided at a dedicated website (http://bibs.snu.ac.kr/software/Bisclass). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-608) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4117951
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41179512014-08-05 Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information Huh, Iksoo Yang, Xingyu Park, Taesung Yi, Soojin V BMC Genomics Methodology Article BACKGROUND: Whole genome sequencing of bisulfite converted DNA (‘methylC-seq’) method provides comprehensive information of DNA methylation. An important application of these whole genome methylation maps is classifying each position as a methylated versus non-methylated nucleotide. A widely used current method for this purpose, the so-called binomial method, is intuitive and straightforward, but lacks power when the sequence coverage and the genome-wide methylation level are low. These problems present a particular challenge when analyzing sparsely methylated genomes, such as those of many invertebrates and plants. RESULTS: We demonstrate that the number of sequence reads per position from methylC-seq data displays a large variance and can be modeled as a shifted negative binomial distribution. We also show that DNA methylation levels of adjacent CpG sites are correlated, and this similarity in local DNA methylation levels extends several kilobases. Taking these observations into account, we propose a new method based on Bayesian classification to infer DNA methylation status while considering the neighborhood DNA methylation levels of a specific site. We show that our approach has higher sensitivity and better classification performance than the binomial method via multiple analyses, including computational simulations, Area Under Curve (AUC) analyses, and improved consistencies across biological replicates. This method is especially advantageous in the analyses of sparsely methylated genomes with low coverage. CONCLUSIONS: Our method improves the existing binomial method for binary methylation calls by utilizing a posterior odds framework and incorporating local methylation information. This method should be widely applicable to the analyses of methylC-seq data from diverse sparsely methylated genomes. Bis-Class and example data are provided at a dedicated website (http://bibs.snu.ac.kr/software/Bisclass). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-608) contains supplementary material, which is available to authorized users. BioMed Central 2014-07-18 /pmc/articles/PMC4117951/ /pubmed/25037738 http://dx.doi.org/10.1186/1471-2164-15-608 Text en © Huh et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Huh, Iksoo
Yang, Xingyu
Park, Taesung
Yi, Soojin V
Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information
title Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information
title_full Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information
title_fullStr Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information
title_full_unstemmed Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information
title_short Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information
title_sort bis-class: a new classification tool of methylation status using bayes classifier and local methylation information
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4117951/
https://www.ncbi.nlm.nih.gov/pubmed/25037738
http://dx.doi.org/10.1186/1471-2164-15-608
work_keys_str_mv AT huhiksoo bisclassanewclassificationtoolofmethylationstatususingbayesclassifierandlocalmethylationinformation
AT yangxingyu bisclassanewclassificationtoolofmethylationstatususingbayesclassifierandlocalmethylationinformation
AT parktaesung bisclassanewclassificationtoolofmethylationstatususingbayesclassifierandlocalmethylationinformation
AT yisoojinv bisclassanewclassificationtoolofmethylationstatususingbayesclassifierandlocalmethylationinformation