Cargando…

Analyzing whole genome bisulfite sequencing data from highly divergent genotypes

In the study of DNA methylation, genetic variation between species, strains or individuals can result in CpG sites that are exclusive to a subset of samples, and insertions and deletions can rearrange the spatial distribution of CpGs. How to account for this variation in an analysis of the interplay...

Descripción completa

Detalles Bibliográficos
Autores principales: Wulfridge, Phillip, Langmead, Ben, Feinberg, Andrew P, Hansen, Kasper D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6821270/
https://www.ncbi.nlm.nih.gov/pubmed/31392989
http://dx.doi.org/10.1093/nar/gkz674
_version_ 1783464113728389120
author Wulfridge, Phillip
Langmead, Ben
Feinberg, Andrew P
Hansen, Kasper D
author_facet Wulfridge, Phillip
Langmead, Ben
Feinberg, Andrew P
Hansen, Kasper D
author_sort Wulfridge, Phillip
collection PubMed
description In the study of DNA methylation, genetic variation between species, strains or individuals can result in CpG sites that are exclusive to a subset of samples, and insertions and deletions can rearrange the spatial distribution of CpGs. How to account for this variation in an analysis of the interplay between sequence variation and DNA methylation is not well understood, especially when the number of CpG differences between samples is large. Here, we use whole-genome bisulfite sequencing data on two highly divergent mouse strains to study this problem. We show that alignment to personal genomes is necessary for valid methylation quantification. We introduce a method for including strain-specific CpGs in differential analysis, and show that this increases power. We apply our method to a human normal-cancer dataset, and show this improves accuracy and power, illustrating the broad applicability of our approach. Our method uses smoothing to impute methylation levels at strain-specific sites, thereby allowing strain-specific CpGs to contribute to the analysis, while accounting for differences in the spatial occurrences of CpGs. Our results have implications for joint analysis of genetic variation and DNA methylation using bisulfite-converted DNA, and unlocks the use of personal genomes for addressing this question.
format Online
Article
Text
id pubmed-6821270
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-68212702019-11-04 Analyzing whole genome bisulfite sequencing data from highly divergent genotypes Wulfridge, Phillip Langmead, Ben Feinberg, Andrew P Hansen, Kasper D Nucleic Acids Res Methods Online In the study of DNA methylation, genetic variation between species, strains or individuals can result in CpG sites that are exclusive to a subset of samples, and insertions and deletions can rearrange the spatial distribution of CpGs. How to account for this variation in an analysis of the interplay between sequence variation and DNA methylation is not well understood, especially when the number of CpG differences between samples is large. Here, we use whole-genome bisulfite sequencing data on two highly divergent mouse strains to study this problem. We show that alignment to personal genomes is necessary for valid methylation quantification. We introduce a method for including strain-specific CpGs in differential analysis, and show that this increases power. We apply our method to a human normal-cancer dataset, and show this improves accuracy and power, illustrating the broad applicability of our approach. Our method uses smoothing to impute methylation levels at strain-specific sites, thereby allowing strain-specific CpGs to contribute to the analysis, while accounting for differences in the spatial occurrences of CpGs. Our results have implications for joint analysis of genetic variation and DNA methylation using bisulfite-converted DNA, and unlocks the use of personal genomes for addressing this question. Oxford University Press 2019-11-04 2019-08-08 /pmc/articles/PMC6821270/ /pubmed/31392989 http://dx.doi.org/10.1093/nar/gkz674 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Wulfridge, Phillip
Langmead, Ben
Feinberg, Andrew P
Hansen, Kasper D
Analyzing whole genome bisulfite sequencing data from highly divergent genotypes
title Analyzing whole genome bisulfite sequencing data from highly divergent genotypes
title_full Analyzing whole genome bisulfite sequencing data from highly divergent genotypes
title_fullStr Analyzing whole genome bisulfite sequencing data from highly divergent genotypes
title_full_unstemmed Analyzing whole genome bisulfite sequencing data from highly divergent genotypes
title_short Analyzing whole genome bisulfite sequencing data from highly divergent genotypes
title_sort analyzing whole genome bisulfite sequencing data from highly divergent genotypes
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6821270/
https://www.ncbi.nlm.nih.gov/pubmed/31392989
http://dx.doi.org/10.1093/nar/gkz674
work_keys_str_mv AT wulfridgephillip analyzingwholegenomebisulfitesequencingdatafromhighlydivergentgenotypes
AT langmeadben analyzingwholegenomebisulfitesequencingdatafromhighlydivergentgenotypes
AT feinbergandrewp analyzingwholegenomebisulfitesequencingdatafromhighlydivergentgenotypes
AT hansenkasperd analyzingwholegenomebisulfitesequencingdatafromhighlydivergentgenotypes