Cargando…

MethylToSNP: identifying SNPs in Illumina DNA methylation array data

BACKGROUND: Current array-based methods for the measurement of DNA methylation rely on the process of sodium bisulfite conversion to differentiate between methylated and unmethylated cytosine bases in DNA. In the absence of genotype data this process can lead to ambiguity in data interpretation when...

Descripción completa

Detalles Bibliográficos
Autores principales: LaBarre, Brenna A., Goncearenco, Alexander, Petrykowska, Hanna M., Jaratlerdsiri, Weerachai, Bornman, M. S. Riana, Hayes, Vanessa M., Elnitski, Laura
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923858/
https://www.ncbi.nlm.nih.gov/pubmed/31861999
http://dx.doi.org/10.1186/s13072-019-0321-6
_version_ 1783481607480410112
author LaBarre, Brenna A.
Goncearenco, Alexander
Petrykowska, Hanna M.
Jaratlerdsiri, Weerachai
Bornman, M. S. Riana
Hayes, Vanessa M.
Elnitski, Laura
author_facet LaBarre, Brenna A.
Goncearenco, Alexander
Petrykowska, Hanna M.
Jaratlerdsiri, Weerachai
Bornman, M. S. Riana
Hayes, Vanessa M.
Elnitski, Laura
author_sort LaBarre, Brenna A.
collection PubMed
description BACKGROUND: Current array-based methods for the measurement of DNA methylation rely on the process of sodium bisulfite conversion to differentiate between methylated and unmethylated cytosine bases in DNA. In the absence of genotype data this process can lead to ambiguity in data interpretation when a sample has polymorphisms at a methylation probe site. A common way to minimize this problem is to exclude such potentially problematic sites, with some methods removing as much as 60% of array probes from consideration before data analysis. RESULTS: Here, we present an algorithm implemented in an R Bioconductor package, MethylToSNP, which detects a characteristic data pattern to infer sites likely to be confounded by polymorphisms. Additionally, the tool provides a stringent reliability score to allow thresholding on SNP predictions. We calibrated parameters and thresholds used by the algorithm on simulated and real methylation data sets. We illustrate findings using methylation data from YRI (Yoruba in Ibadan, Nigeria), CEPH (European descent) and KhoeSan (southern African) populations. Our polymorphism predictions made using MethylToSNP have been validated through SNP databases and bisulfite and genomic sequencing. CONCLUSIONS: The benefits of this method are threefold. First, it prevents extensive data loss by considering only SNPs specific to the individuals in the study. Second, it offers the possibility to identify new polymorphisms in samples for which there is little known about the genetic landscape. Third, it identifies variants as they exist in functional regions of a genome, such as in CTCF (transcriptional repressor) sites and enhancers, that may be common alleles or personal mutations with potential to deleteriously affect genomic regulatory activities. We demonstrate that MethylToSNP is applicable to the Illumina 450K and Illumina 850K EPIC array data and is also backwards compatible to the 27K methylation arrays. Going forward, this kind of nuanced approach can increase the amount of information derived from precious data sets by considering samples of the project individually to enable more informed decisions about data cleaning.
format Online
Article
Text
id pubmed-6923858
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69238582019-12-30 MethylToSNP: identifying SNPs in Illumina DNA methylation array data LaBarre, Brenna A. Goncearenco, Alexander Petrykowska, Hanna M. Jaratlerdsiri, Weerachai Bornman, M. S. Riana Hayes, Vanessa M. Elnitski, Laura Epigenetics Chromatin Methodology BACKGROUND: Current array-based methods for the measurement of DNA methylation rely on the process of sodium bisulfite conversion to differentiate between methylated and unmethylated cytosine bases in DNA. In the absence of genotype data this process can lead to ambiguity in data interpretation when a sample has polymorphisms at a methylation probe site. A common way to minimize this problem is to exclude such potentially problematic sites, with some methods removing as much as 60% of array probes from consideration before data analysis. RESULTS: Here, we present an algorithm implemented in an R Bioconductor package, MethylToSNP, which detects a characteristic data pattern to infer sites likely to be confounded by polymorphisms. Additionally, the tool provides a stringent reliability score to allow thresholding on SNP predictions. We calibrated parameters and thresholds used by the algorithm on simulated and real methylation data sets. We illustrate findings using methylation data from YRI (Yoruba in Ibadan, Nigeria), CEPH (European descent) and KhoeSan (southern African) populations. Our polymorphism predictions made using MethylToSNP have been validated through SNP databases and bisulfite and genomic sequencing. CONCLUSIONS: The benefits of this method are threefold. First, it prevents extensive data loss by considering only SNPs specific to the individuals in the study. Second, it offers the possibility to identify new polymorphisms in samples for which there is little known about the genetic landscape. Third, it identifies variants as they exist in functional regions of a genome, such as in CTCF (transcriptional repressor) sites and enhancers, that may be common alleles or personal mutations with potential to deleteriously affect genomic regulatory activities. We demonstrate that MethylToSNP is applicable to the Illumina 450K and Illumina 850K EPIC array data and is also backwards compatible to the 27K methylation arrays. Going forward, this kind of nuanced approach can increase the amount of information derived from precious data sets by considering samples of the project individually to enable more informed decisions about data cleaning. BioMed Central 2019-12-20 /pmc/articles/PMC6923858/ /pubmed/31861999 http://dx.doi.org/10.1186/s13072-019-0321-6 Text en © The Author(s) 2019 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
LaBarre, Brenna A.
Goncearenco, Alexander
Petrykowska, Hanna M.
Jaratlerdsiri, Weerachai
Bornman, M. S. Riana
Hayes, Vanessa M.
Elnitski, Laura
MethylToSNP: identifying SNPs in Illumina DNA methylation array data
title MethylToSNP: identifying SNPs in Illumina DNA methylation array data
title_full MethylToSNP: identifying SNPs in Illumina DNA methylation array data
title_fullStr MethylToSNP: identifying SNPs in Illumina DNA methylation array data
title_full_unstemmed MethylToSNP: identifying SNPs in Illumina DNA methylation array data
title_short MethylToSNP: identifying SNPs in Illumina DNA methylation array data
title_sort methyltosnp: identifying snps in illumina dna methylation array data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923858/
https://www.ncbi.nlm.nih.gov/pubmed/31861999
http://dx.doi.org/10.1186/s13072-019-0321-6
work_keys_str_mv AT labarrebrennaa methyltosnpidentifyingsnpsinilluminadnamethylationarraydata
AT goncearencoalexander methyltosnpidentifyingsnpsinilluminadnamethylationarraydata
AT petrykowskahannam methyltosnpidentifyingsnpsinilluminadnamethylationarraydata
AT jaratlerdsiriweerachai methyltosnpidentifyingsnpsinilluminadnamethylationarraydata
AT bornmanmsriana methyltosnpidentifyingsnpsinilluminadnamethylationarraydata
AT hayesvanessam methyltosnpidentifyingsnpsinilluminadnamethylationarraydata
AT elnitskilaura methyltosnpidentifyingsnpsinilluminadnamethylationarraydata