Cargando…

A data-driven approach to preprocessing Illumina 450K methylation array data

BACKGROUND: As the most stable and experimentally accessible epigenetic mark, DNA methylation is of great interest to the research community. The landscape of DNA methylation across tissues, through development and in disease pathogenesis is not yet well characterized. Thus there is a need for rapid...

Descripción completa

Detalles Bibliográficos
Autores principales: Pidsley, Ruth, Y Wong, Chloe C, Volta, Manuela, Lunnon, Katie, Mill, Jonathan, Schalkwyk, Leonard C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3769145/
https://www.ncbi.nlm.nih.gov/pubmed/23631413
http://dx.doi.org/10.1186/1471-2164-14-293
_version_ 1782283939280322560
author Pidsley, Ruth
Y Wong, Chloe C
Volta, Manuela
Lunnon, Katie
Mill, Jonathan
Schalkwyk, Leonard C
author_facet Pidsley, Ruth
Y Wong, Chloe C
Volta, Manuela
Lunnon, Katie
Mill, Jonathan
Schalkwyk, Leonard C
author_sort Pidsley, Ruth
collection PubMed
description BACKGROUND: As the most stable and experimentally accessible epigenetic mark, DNA methylation is of great interest to the research community. The landscape of DNA methylation across tissues, through development and in disease pathogenesis is not yet well characterized. Thus there is a need for rapid and cost effective methods for assessing genome-wide levels of DNA methylation. The Illumina Infinium HumanMethylation450 (450K) BeadChip is a very useful addition to the available methods for DNA methylation analysis but its complex design, incorporating two different assay methods, requires careful consideration. Accordingly, several normalization schemes have been published. We have taken advantage of known DNA methylation patterns associated with genomic imprinting and X-chromosome inactivation (XCI), in addition to the performance of SNP genotyping assays present on the array, to derive three independent metrics which we use to test alternative schemes of correction and normalization. These metrics also have potential utility as quality scores for datasets. RESULTS: The standard index of DNA methylation at any specific CpG site is β = M/(M + U + 100) where M and U are methylated and unmethylated signal intensities, respectively. Betas (βs) calculated from raw signal intensities (the default GenomeStudio behavior) perform well, but using 11 methylomic datasets we demonstrate that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics. The commonly used procedure of normalizing betas is inferior to the separate normalization of M and U, and it is also advantageous to normalize Type I and Type II assays separately. More elaborate manipulation of quantiles proves to be counterproductive. CONCLUSIONS: Careful selection of preprocessing steps can minimize variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes. For the convenience of the research community we have created a user-friendly R software package called wateRmelon, downloadable from bioConductor, compatible with the existing methylumi, minfi and IMA packages, that allows others to utilize the same normalization methods and data quality tests on 450K data.
format Online
Article
Text
id pubmed-3769145
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-37691452013-09-11 A data-driven approach to preprocessing Illumina 450K methylation array data Pidsley, Ruth Y Wong, Chloe C Volta, Manuela Lunnon, Katie Mill, Jonathan Schalkwyk, Leonard C BMC Genomics Methodology Article BACKGROUND: As the most stable and experimentally accessible epigenetic mark, DNA methylation is of great interest to the research community. The landscape of DNA methylation across tissues, through development and in disease pathogenesis is not yet well characterized. Thus there is a need for rapid and cost effective methods for assessing genome-wide levels of DNA methylation. The Illumina Infinium HumanMethylation450 (450K) BeadChip is a very useful addition to the available methods for DNA methylation analysis but its complex design, incorporating two different assay methods, requires careful consideration. Accordingly, several normalization schemes have been published. We have taken advantage of known DNA methylation patterns associated with genomic imprinting and X-chromosome inactivation (XCI), in addition to the performance of SNP genotyping assays present on the array, to derive three independent metrics which we use to test alternative schemes of correction and normalization. These metrics also have potential utility as quality scores for datasets. RESULTS: The standard index of DNA methylation at any specific CpG site is β = M/(M + U + 100) where M and U are methylated and unmethylated signal intensities, respectively. Betas (βs) calculated from raw signal intensities (the default GenomeStudio behavior) perform well, but using 11 methylomic datasets we demonstrate that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics. The commonly used procedure of normalizing betas is inferior to the separate normalization of M and U, and it is also advantageous to normalize Type I and Type II assays separately. More elaborate manipulation of quantiles proves to be counterproductive. CONCLUSIONS: Careful selection of preprocessing steps can minimize variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes. For the convenience of the research community we have created a user-friendly R software package called wateRmelon, downloadable from bioConductor, compatible with the existing methylumi, minfi and IMA packages, that allows others to utilize the same normalization methods and data quality tests on 450K data. BioMed Central 2013-05-01 /pmc/articles/PMC3769145/ /pubmed/23631413 http://dx.doi.org/10.1186/1471-2164-14-293 Text en Copyright © 2013 Pidsley et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Pidsley, Ruth
Y Wong, Chloe C
Volta, Manuela
Lunnon, Katie
Mill, Jonathan
Schalkwyk, Leonard C
A data-driven approach to preprocessing Illumina 450K methylation array data
title A data-driven approach to preprocessing Illumina 450K methylation array data
title_full A data-driven approach to preprocessing Illumina 450K methylation array data
title_fullStr A data-driven approach to preprocessing Illumina 450K methylation array data
title_full_unstemmed A data-driven approach to preprocessing Illumina 450K methylation array data
title_short A data-driven approach to preprocessing Illumina 450K methylation array data
title_sort data-driven approach to preprocessing illumina 450k methylation array data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3769145/
https://www.ncbi.nlm.nih.gov/pubmed/23631413
http://dx.doi.org/10.1186/1471-2164-14-293
work_keys_str_mv AT pidsleyruth adatadrivenapproachtopreprocessingillumina450kmethylationarraydata
AT ywongchloec adatadrivenapproachtopreprocessingillumina450kmethylationarraydata
AT voltamanuela adatadrivenapproachtopreprocessingillumina450kmethylationarraydata
AT lunnonkatie adatadrivenapproachtopreprocessingillumina450kmethylationarraydata
AT milljonathan adatadrivenapproachtopreprocessingillumina450kmethylationarraydata
AT schalkwykleonardc adatadrivenapproachtopreprocessingillumina450kmethylationarraydata
AT pidsleyruth datadrivenapproachtopreprocessingillumina450kmethylationarraydata
AT ywongchloec datadrivenapproachtopreprocessingillumina450kmethylationarraydata
AT voltamanuela datadrivenapproachtopreprocessingillumina450kmethylationarraydata
AT lunnonkatie datadrivenapproachtopreprocessingillumina450kmethylationarraydata
AT milljonathan datadrivenapproachtopreprocessingillumina450kmethylationarraydata
AT schalkwykleonardc datadrivenapproachtopreprocessingillumina450kmethylationarraydata