Cargando…
An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies
OBJECTIVE: Illumina BeadChip arrays are commonly used to generate DNA methylation data for large epidemiological studies. Updates in technology over time create challenges for data harmonization within and between studies, many of which obtained data from the older 450K and newer EPIC platforms. The...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8424820/ https://www.ncbi.nlm.nih.gov/pubmed/34496950 http://dx.doi.org/10.1186/s13104-021-05741-2 |
_version_ | 1783749736054915072 |
---|---|
author | Vanderlinden, Lauren A. Johnson, Randi K. Carry, Patrick M. Dong, Fran DeMeo, Dawn L. Yang, Ivana V. Norris, Jill M. Kechris, Katerina |
author_facet | Vanderlinden, Lauren A. Johnson, Randi K. Carry, Patrick M. Dong, Fran DeMeo, Dawn L. Yang, Ivana V. Norris, Jill M. Kechris, Katerina |
author_sort | Vanderlinden, Lauren A. |
collection | PubMed |
description | OBJECTIVE: Illumina BeadChip arrays are commonly used to generate DNA methylation data for large epidemiological studies. Updates in technology over time create challenges for data harmonization within and between studies, many of which obtained data from the older 450K and newer EPIC platforms. The pre-processing pipeline for DNA methylation is not trivial, and influences the downstream analyses. Incorporating different platforms adds a new level of technical variability that has not yet been taken into account by recommended pipelines. Our study evaluated the performance of various tools on different versions of platform data harmonization at each step of pre-processing pipeline, including quality control (QC), normalization, batch effect adjustment, and genomic inflation. We illustrate our novel approach using 450K and EPIC data from the Diabetes Autoimmunity Study in the Young (DAISY) prospective cohort. RESULTS: We found normalization and probe filtering had the biggest effect on data harmonization. Employing a meta-analysis was an effective and easily executable method for accounting for platform variability. Correcting for genomic inflation also helped with harmonization. We present guidelines for studies seeking to harmonize data from the 450K and EPIC platforms, which includes the use of technical replicates for evaluating numerous pre-processing steps, and employing a meta-analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13104-021-05741-2. |
format | Online Article Text |
id | pubmed-8424820 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-84248202021-09-10 An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies Vanderlinden, Lauren A. Johnson, Randi K. Carry, Patrick M. Dong, Fran DeMeo, Dawn L. Yang, Ivana V. Norris, Jill M. Kechris, Katerina BMC Res Notes Research Note OBJECTIVE: Illumina BeadChip arrays are commonly used to generate DNA methylation data for large epidemiological studies. Updates in technology over time create challenges for data harmonization within and between studies, many of which obtained data from the older 450K and newer EPIC platforms. The pre-processing pipeline for DNA methylation is not trivial, and influences the downstream analyses. Incorporating different platforms adds a new level of technical variability that has not yet been taken into account by recommended pipelines. Our study evaluated the performance of various tools on different versions of platform data harmonization at each step of pre-processing pipeline, including quality control (QC), normalization, batch effect adjustment, and genomic inflation. We illustrate our novel approach using 450K and EPIC data from the Diabetes Autoimmunity Study in the Young (DAISY) prospective cohort. RESULTS: We found normalization and probe filtering had the biggest effect on data harmonization. Employing a meta-analysis was an effective and easily executable method for accounting for platform variability. Correcting for genomic inflation also helped with harmonization. We present guidelines for studies seeking to harmonize data from the 450K and EPIC platforms, which includes the use of technical replicates for evaluating numerous pre-processing steps, and employing a meta-analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13104-021-05741-2. BioMed Central 2021-09-08 /pmc/articles/PMC8424820/ /pubmed/34496950 http://dx.doi.org/10.1186/s13104-021-05741-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Note Vanderlinden, Lauren A. Johnson, Randi K. Carry, Patrick M. Dong, Fran DeMeo, Dawn L. Yang, Ivana V. Norris, Jill M. Kechris, Katerina An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies |
title | An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies |
title_full | An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies |
title_fullStr | An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies |
title_full_unstemmed | An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies |
title_short | An effective processing pipeline for harmonizing DNA methylation data from Illumina’s 450K and EPIC platforms for epidemiological studies |
title_sort | effective processing pipeline for harmonizing dna methylation data from illumina’s 450k and epic platforms for epidemiological studies |
topic | Research Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8424820/ https://www.ncbi.nlm.nih.gov/pubmed/34496950 http://dx.doi.org/10.1186/s13104-021-05741-2 |
work_keys_str_mv | AT vanderlindenlaurena aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies AT johnsonrandik aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies AT carrypatrickm aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies AT dongfran aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies AT demeodawnl aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies AT yangivanav aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies AT norrisjillm aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies AT kechriskaterina aneffectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies AT vanderlindenlaurena effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies AT johnsonrandik effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies AT carrypatrickm effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies AT dongfran effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies AT demeodawnl effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies AT yangivanav effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies AT norrisjillm effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies AT kechriskaterina effectiveprocessingpipelineforharmonizingdnamethylationdatafromilluminas450kandepicplatformsforepidemiologicalstudies |