Cargando…

“Gap hunting” to characterize clustered probe signals in Illumina methylation array data

BACKGROUND: The Illumina 450k array has been widely used in epigenetic association studies. Current quality-control (QC) pipelines typically remove certain sets of probes, such as those containing a SNP or with multiple mapping locations. An additional set of potentially problematic probes are those...

Descripción completa

Detalles Bibliográficos
Autores principales: Andrews, Shan V., Ladd-Acosta, Christine, Feinberg, Andrew P., Hansen, Kasper D., Fallin, M. Daniele
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5142147/
https://www.ncbi.nlm.nih.gov/pubmed/27980682
http://dx.doi.org/10.1186/s13072-016-0107-z
_version_ 1782472733519511552
author Andrews, Shan V.
Ladd-Acosta, Christine
Feinberg, Andrew P.
Hansen, Kasper D.
Fallin, M. Daniele
author_facet Andrews, Shan V.
Ladd-Acosta, Christine
Feinberg, Andrew P.
Hansen, Kasper D.
Fallin, M. Daniele
author_sort Andrews, Shan V.
collection PubMed
description BACKGROUND: The Illumina 450k array has been widely used in epigenetic association studies. Current quality-control (QC) pipelines typically remove certain sets of probes, such as those containing a SNP or with multiple mapping locations. An additional set of potentially problematic probes are those with DNA methylation distributions characterized by two or more distinct clusters separated by gaps. Data-driven identification of such probes may offer additional insights for downstream analyses. RESULTS: We developed a procedure, termed “gap hunting,” to identify probes showing clustered distributions. Among 590 peripheral blood samples from the Study to Explore Early Development, we identified 11,007 “gap probes.” The vast majority (9199) are likely attributed to an underlying SNP(s) or other variant in the probe, although SNP-affected probes exist that do not produce a gap signals. Specific factors predict which SNPs lead to gap signals, including type of nucleotide change, probe type, DNA strand, and overall methylation state. These expected effects are demonstrated in paired genotype and 450k data on the same samples. Gap probes can also serve as a surrogate for the local genetic sequence on a haplotype scale and can be used to adjust for population stratification. CONCLUSIONS: The characteristics of gap probes reflect potentially informative biology. QC pipelines may benefit from an efficient data-driven approach that “flags” gap probes, rather than filtering such probes, followed by careful interpretation of downstream association analyses. Our results should translate directly to the recently released Illumina EPIC array given the similar chemistry and content design. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13072-016-0107-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5142147
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51421472016-12-15 “Gap hunting” to characterize clustered probe signals in Illumina methylation array data Andrews, Shan V. Ladd-Acosta, Christine Feinberg, Andrew P. Hansen, Kasper D. Fallin, M. Daniele Epigenetics Chromatin Research BACKGROUND: The Illumina 450k array has been widely used in epigenetic association studies. Current quality-control (QC) pipelines typically remove certain sets of probes, such as those containing a SNP or with multiple mapping locations. An additional set of potentially problematic probes are those with DNA methylation distributions characterized by two or more distinct clusters separated by gaps. Data-driven identification of such probes may offer additional insights for downstream analyses. RESULTS: We developed a procedure, termed “gap hunting,” to identify probes showing clustered distributions. Among 590 peripheral blood samples from the Study to Explore Early Development, we identified 11,007 “gap probes.” The vast majority (9199) are likely attributed to an underlying SNP(s) or other variant in the probe, although SNP-affected probes exist that do not produce a gap signals. Specific factors predict which SNPs lead to gap signals, including type of nucleotide change, probe type, DNA strand, and overall methylation state. These expected effects are demonstrated in paired genotype and 450k data on the same samples. Gap probes can also serve as a surrogate for the local genetic sequence on a haplotype scale and can be used to adjust for population stratification. CONCLUSIONS: The characteristics of gap probes reflect potentially informative biology. QC pipelines may benefit from an efficient data-driven approach that “flags” gap probes, rather than filtering such probes, followed by careful interpretation of downstream association analyses. Our results should translate directly to the recently released Illumina EPIC array given the similar chemistry and content design. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13072-016-0107-z) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-07 /pmc/articles/PMC5142147/ /pubmed/27980682 http://dx.doi.org/10.1186/s13072-016-0107-z Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Andrews, Shan V.
Ladd-Acosta, Christine
Feinberg, Andrew P.
Hansen, Kasper D.
Fallin, M. Daniele
“Gap hunting” to characterize clustered probe signals in Illumina methylation array data
title “Gap hunting” to characterize clustered probe signals in Illumina methylation array data
title_full “Gap hunting” to characterize clustered probe signals in Illumina methylation array data
title_fullStr “Gap hunting” to characterize clustered probe signals in Illumina methylation array data
title_full_unstemmed “Gap hunting” to characterize clustered probe signals in Illumina methylation array data
title_short “Gap hunting” to characterize clustered probe signals in Illumina methylation array data
title_sort “gap hunting” to characterize clustered probe signals in illumina methylation array data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5142147/
https://www.ncbi.nlm.nih.gov/pubmed/27980682
http://dx.doi.org/10.1186/s13072-016-0107-z
work_keys_str_mv AT andrewsshanv gaphuntingtocharacterizeclusteredprobesignalsinilluminamethylationarraydata
AT laddacostachristine gaphuntingtocharacterizeclusteredprobesignalsinilluminamethylationarraydata
AT feinbergandrewp gaphuntingtocharacterizeclusteredprobesignalsinilluminamethylationarraydata
AT hansenkasperd gaphuntingtocharacterizeclusteredprobesignalsinilluminamethylationarraydata
AT fallinmdaniele gaphuntingtocharacterizeclusteredprobesignalsinilluminamethylationarraydata