Cargando…

A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data

High-throughput bisulfite sequencing technologies have provided a comprehensive and well-fitted way to investigate DNA methylation at single-base resolution. However, there are substantial bioinformatic challenges to distinguish precisely methylcytosines from unconverted cytosines based on bisulfite...

Descripción completa

Detalles Bibliográficos
Autores principales: Xie, Qing, Liu, Qi, Mao, Fengbiao, Cai, Wanshi, Wu, Honghu, You, Mingcong, Wang, Zhen, Chen, Bingyu, Sun, Zhong Sheng, Wu, Jinyu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4177668/
https://www.ncbi.nlm.nih.gov/pubmed/25255082
http://dx.doi.org/10.1371/journal.pcbi.1003853
_version_ 1782336809386115072
author Xie, Qing
Liu, Qi
Mao, Fengbiao
Cai, Wanshi
Wu, Honghu
You, Mingcong
Wang, Zhen
Chen, Bingyu
Sun, Zhong Sheng
Wu, Jinyu
author_facet Xie, Qing
Liu, Qi
Mao, Fengbiao
Cai, Wanshi
Wu, Honghu
You, Mingcong
Wang, Zhen
Chen, Bingyu
Sun, Zhong Sheng
Wu, Jinyu
author_sort Xie, Qing
collection PubMed
description High-throughput bisulfite sequencing technologies have provided a comprehensive and well-fitted way to investigate DNA methylation at single-base resolution. However, there are substantial bioinformatic challenges to distinguish precisely methylcytosines from unconverted cytosines based on bisulfite sequencing data. The challenges arise, at least in part, from cell heterozygosis caused by multicellular sequencing and the still limited number of statistical methods that are available for methylcytosine calling based on bisulfite sequencing data. Here, we present an algorithm, termed Bycom, a new Bayesian model that can perform methylcytosine calling with high accuracy. Bycom considers cell heterozygosis along with sequencing errors and bisulfite conversion efficiency to improve calling accuracy. Bycom performance was compared with the performance of Lister, the method most widely used to identify methylcytosines from bisulfite sequencing data. The results showed that the performance of Bycom was better than that of Lister for data with high methylation levels. Bycom also showed higher sensitivity and specificity for low methylation level samples (<1%) than Lister. A validation experiment based on reduced representation bisulfite sequencing data suggested that Bycom had a false positive rate of about 4% while maintaining an accuracy of close to 94%. This study demonstrated that Bycom had a low false calling rate at any methylation level and accurate methylcytosine calling at high methylation levels. Bycom will contribute significantly to studies aimed at recalibrating the methylation level of genomic regions based on the presence of methylcytosines.
format Online
Article
Text
id pubmed-4177668
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41776682014-10-02 A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data Xie, Qing Liu, Qi Mao, Fengbiao Cai, Wanshi Wu, Honghu You, Mingcong Wang, Zhen Chen, Bingyu Sun, Zhong Sheng Wu, Jinyu PLoS Comput Biol Research Article High-throughput bisulfite sequencing technologies have provided a comprehensive and well-fitted way to investigate DNA methylation at single-base resolution. However, there are substantial bioinformatic challenges to distinguish precisely methylcytosines from unconverted cytosines based on bisulfite sequencing data. The challenges arise, at least in part, from cell heterozygosis caused by multicellular sequencing and the still limited number of statistical methods that are available for methylcytosine calling based on bisulfite sequencing data. Here, we present an algorithm, termed Bycom, a new Bayesian model that can perform methylcytosine calling with high accuracy. Bycom considers cell heterozygosis along with sequencing errors and bisulfite conversion efficiency to improve calling accuracy. Bycom performance was compared with the performance of Lister, the method most widely used to identify methylcytosines from bisulfite sequencing data. The results showed that the performance of Bycom was better than that of Lister for data with high methylation levels. Bycom also showed higher sensitivity and specificity for low methylation level samples (<1%) than Lister. A validation experiment based on reduced representation bisulfite sequencing data suggested that Bycom had a false positive rate of about 4% while maintaining an accuracy of close to 94%. This study demonstrated that Bycom had a low false calling rate at any methylation level and accurate methylcytosine calling at high methylation levels. Bycom will contribute significantly to studies aimed at recalibrating the methylation level of genomic regions based on the presence of methylcytosines. Public Library of Science 2014-09-25 /pmc/articles/PMC4177668/ /pubmed/25255082 http://dx.doi.org/10.1371/journal.pcbi.1003853 Text en © 2014 Xie et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Xie, Qing
Liu, Qi
Mao, Fengbiao
Cai, Wanshi
Wu, Honghu
You, Mingcong
Wang, Zhen
Chen, Bingyu
Sun, Zhong Sheng
Wu, Jinyu
A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data
title A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data
title_full A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data
title_fullStr A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data
title_full_unstemmed A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data
title_short A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data
title_sort bayesian framework to identify methylcytosines from high-throughput bisulfite sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4177668/
https://www.ncbi.nlm.nih.gov/pubmed/25255082
http://dx.doi.org/10.1371/journal.pcbi.1003853
work_keys_str_mv AT xieqing abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT liuqi abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT maofengbiao abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT caiwanshi abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT wuhonghu abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT youmingcong abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT wangzhen abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT chenbingyu abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT sunzhongsheng abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT wujinyu abayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT xieqing bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT liuqi bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT maofengbiao bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT caiwanshi bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT wuhonghu bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT youmingcong bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT wangzhen bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT chenbingyu bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT sunzhongsheng bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata
AT wujinyu bayesianframeworktoidentifymethylcytosinesfromhighthroughputbisulfitesequencingdata