Cargando…
A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
Reference-based deconvolution methods use reference libraries of cell-specific DNA methylation (DNAm) measurements as a means toward deconvoluting cell proportions in heterogeneous biospecimens (e.g., whole-blood). As the accuracy of such methods depends highly on the CpG loci comprising the referen...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9004796/ https://www.ncbi.nlm.nih.gov/pubmed/35419567 http://dx.doi.org/10.3389/fbinf.2022.835591 |
_version_ | 1784686334902796288 |
---|---|
author | Bell-Glenn, Shelby Thompson, Jeffrey A. Salas, Lucas A. Koestler, Devin C. |
author_facet | Bell-Glenn, Shelby Thompson, Jeffrey A. Salas, Lucas A. Koestler, Devin C. |
author_sort | Bell-Glenn, Shelby |
collection | PubMed |
description | Reference-based deconvolution methods use reference libraries of cell-specific DNA methylation (DNAm) measurements as a means toward deconvoluting cell proportions in heterogeneous biospecimens (e.g., whole-blood). As the accuracy of such methods depends highly on the CpG loci comprising the reference library, recent research efforts have focused on the selection of libraries to optimize deconvolution accuracy. While existing approaches for library selection work extremely well, the best performing approaches require a training data set consisting of both DNAm profiles over a heterogeneous cell population and gold-standard measurements of cell composition (e.g., flow cytometry) in the same samples. Here, we present a framework for reference library selection without a training dataset (RESET) and benchmark it against the Legacy method (minfi:pickCompProbes), where libraries are constructed based on a pre-specified number of cell-specific differentially methylated loci (DML). RESET uses a modified version of the Dispersion Separability Criteria (DSC) for comparing different libraries and has four main steps: 1) identify a candidate set of cell-specific DMLs, 2) randomly sample DMLs from the candidate set, 3) compute the Modified DSC of the selected DMLs, and 4) update the selection probabilities of DMLs based on their contribution to the Modified DSC. Steps 2–4 are repeated many times and the library with the largest Modified DSC is selected for subsequent reference-based deconvolution. We evaluated RESET using several publicly available datasets consisting of whole-blood DNAm measurements with corresponding measurements of cell composition. We computed the RMSE and R (2) between the predicted cell proportions and their measured values. RESET outperformed the Legacy approach in selecting libraries that improve the accuracy of deconvolution estimates. Additionally, reference libraries constructed using RESET resulted in cellular composition estimates that explained more variation in DNAm as compared to the Legacy approach when evaluated in the context of epigenome-wide association studies (EWAS) of several publicly available data sets. This finding has implications for the statistical power of EWAS. RESET combats potential challenges associated with existing approaches for reference library assembly and thus, may serve as a viable strategy for library construction in the absence of a training data set. |
format | Online Article Text |
id | pubmed-9004796 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-90047962022-04-12 A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures Bell-Glenn, Shelby Thompson, Jeffrey A. Salas, Lucas A. Koestler, Devin C. Front Bioinform Bioinformatics Reference-based deconvolution methods use reference libraries of cell-specific DNA methylation (DNAm) measurements as a means toward deconvoluting cell proportions in heterogeneous biospecimens (e.g., whole-blood). As the accuracy of such methods depends highly on the CpG loci comprising the reference library, recent research efforts have focused on the selection of libraries to optimize deconvolution accuracy. While existing approaches for library selection work extremely well, the best performing approaches require a training data set consisting of both DNAm profiles over a heterogeneous cell population and gold-standard measurements of cell composition (e.g., flow cytometry) in the same samples. Here, we present a framework for reference library selection without a training dataset (RESET) and benchmark it against the Legacy method (minfi:pickCompProbes), where libraries are constructed based on a pre-specified number of cell-specific differentially methylated loci (DML). RESET uses a modified version of the Dispersion Separability Criteria (DSC) for comparing different libraries and has four main steps: 1) identify a candidate set of cell-specific DMLs, 2) randomly sample DMLs from the candidate set, 3) compute the Modified DSC of the selected DMLs, and 4) update the selection probabilities of DMLs based on their contribution to the Modified DSC. Steps 2–4 are repeated many times and the library with the largest Modified DSC is selected for subsequent reference-based deconvolution. We evaluated RESET using several publicly available datasets consisting of whole-blood DNAm measurements with corresponding measurements of cell composition. We computed the RMSE and R (2) between the predicted cell proportions and their measured values. RESET outperformed the Legacy approach in selecting libraries that improve the accuracy of deconvolution estimates. Additionally, reference libraries constructed using RESET resulted in cellular composition estimates that explained more variation in DNAm as compared to the Legacy approach when evaluated in the context of epigenome-wide association studies (EWAS) of several publicly available data sets. This finding has implications for the statistical power of EWAS. RESET combats potential challenges associated with existing approaches for reference library assembly and thus, may serve as a viable strategy for library construction in the absence of a training data set. Frontiers Media S.A. 2022-03-21 /pmc/articles/PMC9004796/ /pubmed/35419567 http://dx.doi.org/10.3389/fbinf.2022.835591 Text en Copyright © 2022 Bell-Glenn, Thompson, Salas and Koestler. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Bioinformatics Bell-Glenn, Shelby Thompson, Jeffrey A. Salas, Lucas A. Koestler, Devin C. A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures |
title | A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures |
title_full | A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures |
title_fullStr | A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures |
title_full_unstemmed | A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures |
title_short | A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures |
title_sort | novel framework for the identification of reference dna methylation libraries for reference-based deconvolution of cellular mixtures |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9004796/ https://www.ncbi.nlm.nih.gov/pubmed/35419567 http://dx.doi.org/10.3389/fbinf.2022.835591 |
work_keys_str_mv | AT bellglennshelby anovelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures AT thompsonjeffreya anovelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures AT salaslucasa anovelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures AT koestlerdevinc anovelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures AT bellglennshelby novelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures AT thompsonjeffreya novelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures AT salaslucasa novelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures AT koestlerdevinc novelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures |