Cargando…

An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies

OBJECTIVE: Illumina BeadChip arrays are commonly used to generate DNA methylation data for large epidemiological studies. Updates in technology over time create challenges for data harmonization within and between studies, many of which obtained data from the older 450K and newer EPIC platforms. The...

Descripción completa

Detalles Bibliográficos
Autores principales: Vanderlinden, Lauren A., Johnson, Randi K., Carry, Patrick M., Dong, Fran, DeMeo, Dawn L., Yang, Ivana V., Norris, Jill M., Kechris, Katerina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8424820/
https://www.ncbi.nlm.nih.gov/pubmed/34496950
http://dx.doi.org/10.1186/s13104-021-05741-2
_version_ 1783749736054915072
author Vanderlinden, Lauren A.
Johnson, Randi K.
Carry, Patrick M.
Dong, Fran
DeMeo, Dawn L.
Yang, Ivana V.
Norris, Jill M.
Kechris, Katerina
author_facet Vanderlinden, Lauren A.
Johnson, Randi K.
Carry, Patrick M.
Dong, Fran
DeMeo, Dawn L.
Yang, Ivana V.
Norris, Jill M.
Kechris, Katerina
author_sort Vanderlinden, Lauren A.
collection PubMed
description OBJECTIVE: Illumina BeadChip arrays are commonly used to generate DNA methylation data for large epidemiological studies. Updates in technology over time create challenges for data harmonization within and between studies, many of which obtained data from the older 450K and newer EPIC platforms. The pre-processing pipeline for DNA methylation is not trivial, and influences the downstream analyses. Incorporating different platforms adds a new level of technical variability that has not yet been taken into account by recommended pipelines. Our study evaluated the performance of various tools on different versions of platform data harmonization at each step of pre-processing pipeline, including quality control (QC), normalization, batch effect adjustment, and genomic inflation. We illustrate our novel approach using 450K and EPIC data from the Diabetes Autoimmunity Study in the Young (DAISY) prospective cohort. RESULTS: We found normalization and probe filtering had the biggest effect on data harmonization. Employing a meta-analysis was an effective and easily executable method for accounting for platform variability. Correcting for genomic inflation also helped with harmonization. We present guidelines for studies seeking to harmonize data from the 450K and EPIC platforms, which includes the use of technical replicates for evaluating numerous pre-processing steps, and employing a meta-analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13104-021-05741-2.
format Online
Article
Text
id pubmed-8424820
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-84248202021-09-10 An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies Vanderlinden, Lauren A. Johnson, Randi K. Carry, Patrick M. Dong, Fran DeMeo, Dawn L. Yang, Ivana V. Norris, Jill M. Kechris, Katerina BMC Res Notes Research Note OBJECTIVE: Illumina BeadChip arrays are commonly used to generate DNA methylation data for large epidemiological studies. Updates in technology over time create challenges for data harmonization within and between studies, many of which obtained data from the older 450K and newer EPIC platforms. The pre-processing pipeline for DNA methylation is not trivial, and influences the downstream analyses. Incorporating different platforms adds a new level of technical variability that has not yet been taken into account by recommended pipelines. Our study evaluated the performance of various tools on different versions of platform data harmonization at each step of pre-processing pipeline, including quality control (QC), normalization, batch effect adjustment, and genomic inflation. We illustrate our novel approach using 450K and EPIC data from the Diabetes Autoimmunity Study in the Young (DAISY) prospective cohort. RESULTS: We found normalization and probe filtering had the biggest effect on data harmonization. Employing a meta-analysis was an effective and easily executable method for accounting for platform variability. Correcting for genomic inflation also helped with harmonization. We present guidelines for studies seeking to harmonize data from the 450K and EPIC platforms, which includes the use of technical replicates for evaluating numerous pre-processing steps, and employing a meta-analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13104-021-05741-2. BioMed Central 2021-09-08 /pmc/articles/PMC8424820/ /pubmed/34496950 http://dx.doi.org/10.1186/s13104-021-05741-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Note
Vanderlinden, Lauren A.
Johnson, Randi K.
Carry, Patrick M.
Dong, Fran
DeMeo, Dawn L.
Yang, Ivana V.
Norris, Jill M.
Kechris, Katerina
An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies
title An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies
title_full An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies
title_fullStr An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies
title_full_unstemmed An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies
title_short An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies
title_sort effective processing pipeline for harmonizing dna methylation data from illumina’s 450k and epic platforms for epidemiological studies
topic Research Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8424820/
https://www.ncbi.nlm.nih.gov/pubmed/34496950
http://dx.doi.org/10.1186/s13104-021-05741-2
work_keys_str_mv AT vanderlindenlaurena aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies
AT johnsonrandik aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies
AT carrypatrickm aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies
AT dongfran aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies
AT demeodawnl aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies
AT yangivanav aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies
AT norrisjillm aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies
AT kechriskaterina aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies
AT vanderlindenlaurena effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies
AT johnsonrandik effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies
AT carrypatrickm effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies
AT dongfran effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies
AT demeodawnl effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies
AT yangivanav effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies
AT norrisjillm effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies
AT kechriskaterina effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies