Cargando…
A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data
High-throughput bisulfite sequencing technologies have provided a comprehensive and well-fitted way to investigate DNA methylation at single-base resolution. However, there are substantial bioinformatic challenges to distinguish precisely methylcytosines from unconverted cytosines based on bisulfite...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4177668/ https://www.ncbi.nlm.nih.gov/pubmed/25255082 http://dx.doi.org/10.1371/journal.pcbi.1003853 |
_version_ | 1782336809386115072 |
---|---|
author | Xie, Qing Liu, Qi Mao, Fengbiao Cai, Wanshi Wu, Honghu You, Mingcong Wang, Zhen Chen, Bingyu Sun, Zhong Sheng Wu, Jinyu |
author_facet | Xie, Qing Liu, Qi Mao, Fengbiao Cai, Wanshi Wu, Honghu You, Mingcong Wang, Zhen Chen, Bingyu Sun, Zhong Sheng Wu, Jinyu |
author_sort | Xie, Qing |
collection | PubMed |
description | High-throughput bisulfite sequencing technologies have provided a comprehensive and well-fitted way to investigate DNA methylation at single-base resolution. However, there are substantial bioinformatic challenges to distinguish precisely methylcytosines from unconverted cytosines based on bisulfite sequencing data. The challenges arise, at least in part, from cell heterozygosis caused by multicellular sequencing and the still limited number of statistical methods that are available for methylcytosine calling based on bisulfite sequencing data. Here, we present an algorithm, termed Bycom, a new Bayesian model that can perform methylcytosine calling with high accuracy. Bycom considers cell heterozygosis along with sequencing errors and bisulfite conversion efficiency to improve calling accuracy. Bycom performance was compared with the performance of Lister, the method most widely used to identify methylcytosines from bisulfite sequencing data. The results showed that the performance of Bycom was better than that of Lister for data with high methylation levels. Bycom also showed higher sensitivity and specificity for low methylation level samples (<1%) than Lister. A validation experiment based on reduced representation bisulfite sequencing data suggested that Bycom had a false positive rate of about 4% while maintaining an accuracy of close to 94%. This study demonstrated that Bycom had a low false calling rate at any methylation level and accurate methylcytosine calling at high methylation levels. Bycom will contribute significantly to studies aimed at recalibrating the methylation level of genomic regions based on the presence of methylcytosines. |
format | Online Article Text |
id | pubmed-4177668 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-41776682014-10-02 A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data Xie, Qing Liu, Qi Mao, Fengbiao Cai, Wanshi Wu, Honghu You, Mingcong Wang, Zhen Chen, Bingyu Sun, Zhong Sheng Wu, Jinyu PLoS Comput Biol Research Article High-throughput bisulfite sequencing technologies have provided a comprehensive and well-fitted way to investigate DNA methylation at single-base resolution. However, there are substantial bioinformatic challenges to distinguish precisely methylcytosines from unconverted cytosines based on bisulfite sequencing data. The challenges arise, at least in part, from cell heterozygosis caused by multicellular sequencing and the still limited number of statistical methods that are available for methylcytosine calling based on bisulfite sequencing data. Here, we present an algorithm, termed Bycom, a new Bayesian model that can perform methylcytosine calling with high accuracy. Bycom considers cell heterozygosis along with sequencing errors and bisulfite conversion efficiency to improve calling accuracy. Bycom performance was compared with the performance of Lister, the method most widely used to identify methylcytosines from bisulfite sequencing data. The results showed that the performance of Bycom was better than that of Lister for data with high methylation levels. Bycom also showed higher sensitivity and specificity for low methylation level samples (<1%) than Lister. A validation experiment based on reduced representation bisulfite sequencing data suggested that Bycom had a false positive rate of about 4% while maintaining an accuracy of close to 94%. This study demonstrated that Bycom had a low false calling rate at any methylation level and accurate methylcytosine calling at high methylation levels. Bycom will contribute significantly to studies aimed at recalibrating the methylation level of genomic regions based on the presence of methylcytosines. Public Library of Science 2014-09-25 /pmc/articles/PMC4177668/ /pubmed/25255082 http://dx.doi.org/10.1371/journal.pcbi.1003853 Text en © 2014 Xie et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Xie, Qing Liu, Qi Mao, Fengbiao Cai, Wanshi Wu, Honghu You, Mingcong Wang, Zhen Chen, Bingyu Sun, Zhong Sheng Wu, Jinyu A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data |
title | A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data |
title_full | A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data |
title_fullStr | A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data |
title_full_unstemmed | A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data |
title_short | A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data |
title_sort | bayesian framework to identify methylcytosines from high-throughput bisulfite sequencing data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4177668/ https://www.ncbi.nlm.nih.gov/pubmed/25255082 http://dx.doi.org/10.1371/journal.pcbi.1003853 |
work_keys_str_mv | AT xieqing abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT liuqi abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT maofengbiao abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT caiwanshi abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT wuhonghu abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT youmingcong abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT wangzhen abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT chenbingyu abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT sunzhongsheng abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT wujinyu abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT xieqing bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT liuqi bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT maofengbiao bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT caiwanshi bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT wuhonghu bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT youmingcong bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT wangzhen bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT chenbingyu bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT sunzhongsheng bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata AT wujinyu bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata |