Cargando…

BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues

BACKGROUND: Bisulfite sequencing is widely employed to study the role of DNA methylation in disease; however, the data suffer from biases due to coverage depth variability. Imputation of methylation values at low-coverage sites may mitigate these biases while also identifying important genomic featu...

Descripción completa

Detalles Bibliográficos
Autores principales: Zou, Luli S., Erdos, Michael R., Taylor, D. Leland, Chines, Peter S., Varshney, Arushi, Parker, Stephen C. J., Collins, Francis S., Didion, John P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5966887/
https://www.ncbi.nlm.nih.gov/pubmed/29792182
http://dx.doi.org/10.1186/s12864-018-4766-y
_version_ 1783325528583831552
author Zou, Luli S.
Erdos, Michael R.
Taylor, D. Leland
Chines, Peter S.
Varshney, Arushi
Parker, Stephen C. J.
Collins, Francis S.
Didion, John P.
author_facet Zou, Luli S.
Erdos, Michael R.
Taylor, D. Leland
Chines, Peter S.
Varshney, Arushi
Parker, Stephen C. J.
Collins, Francis S.
Didion, John P.
author_sort Zou, Luli S.
collection PubMed
description BACKGROUND: Bisulfite sequencing is widely employed to study the role of DNA methylation in disease; however, the data suffer from biases due to coverage depth variability. Imputation of methylation values at low-coverage sites may mitigate these biases while also identifying important genomic features associated with predictive power. RESULTS: Here we describe BoostMe, a method for imputing low-quality DNA methylation estimates within whole-genome bisulfite sequencing (WGBS) data. BoostMe uses a gradient boosting algorithm, XGBoost, and leverages information from multiple samples for prediction. We find that BoostMe outperforms existing algorithms in speed and accuracy when applied to WGBS of human tissues. Furthermore, we show that imputation improves concordance between WGBS and the MethylationEPIC array at low WGBS depth, suggesting improved WGBS accuracy after imputation. CONCLUSIONS: Our findings support the use of BoostMe as a preprocessing step for WGBS analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4766-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5966887
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59668872018-05-24 BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues Zou, Luli S. Erdos, Michael R. Taylor, D. Leland Chines, Peter S. Varshney, Arushi Parker, Stephen C. J. Collins, Francis S. Didion, John P. BMC Genomics Methodology Article BACKGROUND: Bisulfite sequencing is widely employed to study the role of DNA methylation in disease; however, the data suffer from biases due to coverage depth variability. Imputation of methylation values at low-coverage sites may mitigate these biases while also identifying important genomic features associated with predictive power. RESULTS: Here we describe BoostMe, a method for imputing low-quality DNA methylation estimates within whole-genome bisulfite sequencing (WGBS) data. BoostMe uses a gradient boosting algorithm, XGBoost, and leverages information from multiple samples for prediction. We find that BoostMe outperforms existing algorithms in speed and accuracy when applied to WGBS of human tissues. Furthermore, we show that imputation improves concordance between WGBS and the MethylationEPIC array at low WGBS depth, suggesting improved WGBS accuracy after imputation. CONCLUSIONS: Our findings support the use of BoostMe as a preprocessing step for WGBS analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4766-y) contains supplementary material, which is available to authorized users. BioMed Central 2018-05-23 /pmc/articles/PMC5966887/ /pubmed/29792182 http://dx.doi.org/10.1186/s12864-018-4766-y Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Zou, Luli S.
Erdos, Michael R.
Taylor, D. Leland
Chines, Peter S.
Varshney, Arushi
Parker, Stephen C. J.
Collins, Francis S.
Didion, John P.
BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues
title BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues
title_full BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues
title_fullStr BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues
title_full_unstemmed BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues
title_short BoostMe accurately predicts DNA methylation values in whole-genome bisulfite sequencing of multiple human tissues
title_sort boostme accurately predicts dna methylation values in whole-genome bisulfite sequencing of multiple human tissues
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5966887/
https://www.ncbi.nlm.nih.gov/pubmed/29792182
http://dx.doi.org/10.1186/s12864-018-4766-y
work_keys_str_mv AT zoululis boostmeaccuratelypredictsdnamethylationvaluesinwholegenomebisulfitesequencingofmultiplehumantissues
AT erdosmichaelr boostmeaccuratelypredictsdnamethylationvaluesinwholegenomebisulfitesequencingofmultiplehumantissues
AT taylordleland boostmeaccuratelypredictsdnamethylationvaluesinwholegenomebisulfitesequencingofmultiplehumantissues
AT chinespeters boostmeaccuratelypredictsdnamethylationvaluesinwholegenomebisulfitesequencingofmultiplehumantissues
AT varshneyarushi boostmeaccuratelypredictsdnamethylationvaluesinwholegenomebisulfitesequencingofmultiplehumantissues
AT boostmeaccuratelypredictsdnamethylationvaluesinwholegenomebisulfitesequencingofmultiplehumantissues
AT parkerstephencj boostmeaccuratelypredictsdnamethylationvaluesinwholegenomebisulfitesequencingofmultiplehumantissues
AT collinsfranciss boostmeaccuratelypredictsdnamethylationvaluesinwholegenomebisulfitesequencingofmultiplehumantissues
AT didionjohnp boostmeaccuratelypredictsdnamethylationvaluesinwholegenomebisulfitesequencingofmultiplehumantissues