Cargando…
A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data
BACKGROUND: The Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-proc...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10008016/ https://www.ncbi.nlm.nih.gov/pubmed/36906598 http://dx.doi.org/10.1186/s13148-023-01459-z |
_version_ | 1784905662561517568 |
---|---|
author | Welsh, H. Batalha, C. M. P. F. Li, W. Mpye, K. L. Souza-Pinto, N. C. Naslavsky, M. S. Parra, E. J. |
author_facet | Welsh, H. Batalha, C. M. P. F. Li, W. Mpye, K. L. Souza-Pinto, N. C. Naslavsky, M. S. Parra, E. J. |
author_sort | Welsh, H. |
collection | PubMed |
description | BACKGROUND: The Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-processing methods have been developed to reduce probe type bias as well as other issues such as background and dye bias. METHODS: This study evaluates the performance of various normalization methods using 16 replicated samples and three metrics: absolute beta-value difference, overlap of non-replicated CpGs between replicate pairs, and effect on beta-value distributions. Additionally, we carried out Pearson’s correlation and intraclass correlation coefficient (ICC) analyses using both raw and SeSAMe 2 normalized data. RESULTS: The method we define as SeSAMe 2, which consists of the application of the regular SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found to be the best performing normalization method, while quantile-based methods were found to be the worst performing methods. Whole-array Pearson’s correlations were found to be high. However, in agreement with previous studies, a substantial proportion of the probes on the EPIC array showed poor reproducibility (ICC < 0.50). The majority of poor performing probes have beta values close to either 0 or 1, and relatively low standard deviations. These results suggest that probe reliability is largely the result of limited biological variation rather than technical measurement variation. Importantly, normalizing the data with SeSAMe 2 dramatically improved ICC estimates, with the proportion of probes with ICC values > 0.50 increasing from 45.18% (raw data) to 61.35% (SeSAMe 2). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13148-023-01459-z. |
format | Online Article Text |
id | pubmed-10008016 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-100080162023-03-13 A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data Welsh, H. Batalha, C. M. P. F. Li, W. Mpye, K. L. Souza-Pinto, N. C. Naslavsky, M. S. Parra, E. J. Clin Epigenetics Research BACKGROUND: The Infinium EPIC array measures the methylation status of > 850,000 CpG sites. The EPIC BeadChip uses a two-array design: Infinium Type I and Type II probes. These probe types exhibit different technical characteristics which may confound analyses. Numerous normalization and pre-processing methods have been developed to reduce probe type bias as well as other issues such as background and dye bias. METHODS: This study evaluates the performance of various normalization methods using 16 replicated samples and three metrics: absolute beta-value difference, overlap of non-replicated CpGs between replicate pairs, and effect on beta-value distributions. Additionally, we carried out Pearson’s correlation and intraclass correlation coefficient (ICC) analyses using both raw and SeSAMe 2 normalized data. RESULTS: The method we define as SeSAMe 2, which consists of the application of the regular SeSAMe pipeline with an additional round of QC, pOOBAH masking, was found to be the best performing normalization method, while quantile-based methods were found to be the worst performing methods. Whole-array Pearson’s correlations were found to be high. However, in agreement with previous studies, a substantial proportion of the probes on the EPIC array showed poor reproducibility (ICC < 0.50). The majority of poor performing probes have beta values close to either 0 or 1, and relatively low standard deviations. These results suggest that probe reliability is largely the result of limited biological variation rather than technical measurement variation. Importantly, normalizing the data with SeSAMe 2 dramatically improved ICC estimates, with the proportion of probes with ICC values > 0.50 increasing from 45.18% (raw data) to 61.35% (SeSAMe 2). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13148-023-01459-z. BioMed Central 2023-03-11 /pmc/articles/PMC10008016/ /pubmed/36906598 http://dx.doi.org/10.1186/s13148-023-01459-z Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Welsh, H. Batalha, C. M. P. F. Li, W. Mpye, K. L. Souza-Pinto, N. C. Naslavsky, M. S. Parra, E. J. A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data |
title | A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data |
title_full | A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data |
title_fullStr | A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data |
title_full_unstemmed | A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data |
title_short | A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data |
title_sort | systematic evaluation of normalization methods and probe replicability using infinium epic methylation data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10008016/ https://www.ncbi.nlm.nih.gov/pubmed/36906598 http://dx.doi.org/10.1186/s13148-023-01459-z |
work_keys_str_mv | AT welshh asystematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata AT batalhacmpf asystematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata AT liw asystematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata AT mpyekl asystematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata AT souzapintonc asystematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata AT naslavskyms asystematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata AT parraej asystematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata AT welshh systematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata AT batalhacmpf systematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata AT liw systematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata AT mpyekl systematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata AT souzapintonc systematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata AT naslavskyms systematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata AT parraej systematicevaluationofnormalizationmethodsandprobereplicabilityusinginfiniumepicmethylationdata |