Cargando…

A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data

BACKGROUND: The Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-proc...

Descripción completa

Detalles Bibliográficos
Autores principales: Welsh, H., Batalha, C. M. P. F., Li, W., Mpye, K. L., Souza-Pinto, N. C., Naslavsky, M. S., Parra, E. J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10008016/
https://www.ncbi.nlm.nih.gov/pubmed/36906598
http://dx.doi.org/10.1186/s13148-023-01459-z
_version_ 1784905662561517568
author Welsh, H.
Batalha, C. M. P. F.
Li, W.
Mpye, K. L.
Souza-Pinto, N. C.
Naslavsky, M. S.
Parra, E. J.
author_facet Welsh, H.
Batalha, C. M. P. F.
Li, W.
Mpye, K. L.
Souza-Pinto, N. C.
Naslavsky, M. S.
Parra, E. J.
author_sort Welsh, H.
collection PubMed
description BACKGROUND: The Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-processing methods have been developed to reduce probe type bias as well as other issues such as background and dye bias. METHODS: This study evaluates the performance of various normalization methods using 16 replicated samples and three metrics: absolute beta-value difference, overlap of non-replicated CpGs between replicate pairs, and effect on beta-value distributions. Additionally, we carried out Pearson’s correlation and intraclass correlation coefficient (ICC) analyses using both raw and SeSAMe 2 normalized data. RESULTS: The method we define as SeSAMe 2, which consists of the application of the regular SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found to be the best performing normalization method, while quantile-based methods were found to be the worst performing methods. Whole-array Pearson’s correlations were found to be high. However, in agreement with previous studies, a substantial proportion of the probes on the EPIC array showed poor reproducibility (ICC < 0.50). The majority of poor performing probes have beta values close to either 0 or 1, and relatively low standard deviations. These results suggest that probe reliability is largely the result of limited biological variation rather than technical measurement variation. Importantly, normalizing the data with SeSAMe 2 dramatically improved ICC estimates, with the proportion of probes with ICC values > 0.50 increasing from 45.18% (raw data) to 61.35% (SeSAMe 2). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13148-023-01459-z.
format Online
Article
Text
id pubmed-10008016
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-100080162023-03-13 A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data Welsh, H. Batalha, C. M. P. F. Li, W. Mpye, K. L. Souza-Pinto, N. C. Naslavsky, M. S. Parra, E. J. Clin Epigenetics Research BACKGROUND: The Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-processing methods have been developed to reduce probe type bias as well as other issues such as background and dye bias. METHODS: This study evaluates the performance of various normalization methods using 16 replicated samples and three metrics: absolute beta-value difference, overlap of non-replicated CpGs between replicate pairs, and effect on beta-value distributions. Additionally, we carried out Pearson’s correlation and intraclass correlation coefficient (ICC) analyses using both raw and SeSAMe 2 normalized data. RESULTS: The method we define as SeSAMe 2, which consists of the application of the regular SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found to be the best performing normalization method, while quantile-based methods were found to be the worst performing methods. Whole-array Pearson’s correlations were found to be high. However, in agreement with previous studies, a substantial proportion of the probes on the EPIC array showed poor reproducibility (ICC < 0.50). The majority of poor performing probes have beta values close to either 0 or 1, and relatively low standard deviations. These results suggest that probe reliability is largely the result of limited biological variation rather than technical measurement variation. Importantly, normalizing the data with SeSAMe 2 dramatically improved ICC estimates, with the proportion of probes with ICC values > 0.50 increasing from 45.18% (raw data) to 61.35% (SeSAMe 2). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13148-023-01459-z. BioMed Central 2023-03-11 /pmc/articles/PMC10008016/ /pubmed/36906598 http://dx.doi.org/10.1186/s13148-023-01459-z Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Welsh, H.
Batalha, C. M. P. F.
Li, W.
Mpye, K. L.
Souza-Pinto, N. C.
Naslavsky, M. S.
Parra, E. J.
A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data
title A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data
title_full A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data
title_fullStr A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data
title_full_unstemmed A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data
title_short A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data
title_sort systematic evaluation of normalization methods and probe replicability using infinium epic methylation data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10008016/
https://www.ncbi.nlm.nih.gov/pubmed/36906598
http://dx.doi.org/10.1186/s13148-023-01459-z
work_keys_str_mv AT welshh asystematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata
AT batalhacmpf asystematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata
AT liw asystematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata
AT mpyekl asystematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata
AT souzapintonc asystematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata
AT naslavskyms asystematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata
AT parraej asystematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata
AT welshh systematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata
AT batalhacmpf systematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata
AT liw systematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata
AT mpyekl systematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata
AT souzapintonc systematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata
AT naslavskyms systematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata
AT parraej systematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata