Cargando…

A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures

Reference-based deconvolution methods use reference libraries of cell-specific DNA methylation (DNAm) measurements as a means toward deconvoluting cell proportions in heterogeneous biospecimens (e.g., whole-blood). As the accuracy of such methods depends highly on the CpG loci comprising the referen...

Descripción completa

Detalles Bibliográficos
Autores principales: Bell-Glenn, Shelby, Thompson, Jeffrey A., Salas, Lucas A., Koestler, Devin C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9004796/
https://www.ncbi.nlm.nih.gov/pubmed/35419567
http://dx.doi.org/10.3389/fbinf.2022.835591
_version_ 1784686334902796288
author Bell-Glenn, Shelby
Thompson, Jeffrey A.
Salas, Lucas A.
Koestler, Devin C.
author_facet Bell-Glenn, Shelby
Thompson, Jeffrey A.
Salas, Lucas A.
Koestler, Devin C.
author_sort Bell-Glenn, Shelby
collection PubMed
description Reference-based deconvolution methods use reference libraries of cell-specific DNA methylation (DNAm) measurements as a means toward deconvoluting cell proportions in heterogeneous biospecimens (e.g., whole-blood). As the accuracy of such methods depends highly on the CpG loci comprising the reference library, recent research efforts have focused on the selection of libraries to optimize deconvolution accuracy. While existing approaches for library selection work extremely well, the best performing approaches require a training data set consisting of both DNAm profiles over a heterogeneous cell population and gold-standard measurements of cell composition (e.g., flow cytometry) in the same samples. Here, we present a framework for reference library selection without a training dataset (RESET) and benchmark it against the Legacy method (minfi:pickCompProbes), where libraries are constructed based on a pre-specified number of cell-specific differentially methylated loci (DML). RESET uses a modified version of the Dispersion Separability Criteria (DSC) for comparing different libraries and has four main steps: 1) identify a candidate set of cell-specific DMLs, 2) randomly sample DMLs from the candidate set, 3) compute the Modified DSC of the selected DMLs, and 4) update the selection probabilities of DMLs based on their contribution to the Modified DSC. Steps 2–4 are repeated many times and the library with the largest Modified DSC is selected for subsequent reference-based deconvolution. We evaluated RESET using several publicly available datasets consisting of whole-blood DNAm measurements with corresponding measurements of cell composition. We computed the RMSE and R (2) between the predicted cell proportions and their measured values. RESET outperformed the Legacy approach in selecting libraries that improve the accuracy of deconvolution estimates. Additionally, reference libraries constructed using RESET resulted in cellular composition estimates that explained more variation in DNAm as compared to the Legacy approach when evaluated in the context of epigenome-wide association studies (EWAS) of several publicly available data sets. This finding has implications for the statistical power of EWAS. RESET combats potential challenges associated with existing approaches for reference library assembly and thus, may serve as a viable strategy for library construction in the absence of a training data set.
format Online
Article
Text
id pubmed-9004796
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-90047962022-04-12 A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures Bell-Glenn, Shelby Thompson, Jeffrey A. Salas, Lucas A. Koestler, Devin C. Front Bioinform Bioinformatics Reference-based deconvolution methods use reference libraries of cell-specific DNA methylation (DNAm) measurements as a means toward deconvoluting cell proportions in heterogeneous biospecimens (e.g., whole-blood). As the accuracy of such methods depends highly on the CpG loci comprising the reference library, recent research efforts have focused on the selection of libraries to optimize deconvolution accuracy. While existing approaches for library selection work extremely well, the best performing approaches require a training data set consisting of both DNAm profiles over a heterogeneous cell population and gold-standard measurements of cell composition (e.g., flow cytometry) in the same samples. Here, we present a framework for reference library selection without a training dataset (RESET) and benchmark it against the Legacy method (minfi:pickCompProbes), where libraries are constructed based on a pre-specified number of cell-specific differentially methylated loci (DML). RESET uses a modified version of the Dispersion Separability Criteria (DSC) for comparing different libraries and has four main steps: 1) identify a candidate set of cell-specific DMLs, 2) randomly sample DMLs from the candidate set, 3) compute the Modified DSC of the selected DMLs, and 4) update the selection probabilities of DMLs based on their contribution to the Modified DSC. Steps 2–4 are repeated many times and the library with the largest Modified DSC is selected for subsequent reference-based deconvolution. We evaluated RESET using several publicly available datasets consisting of whole-blood DNAm measurements with corresponding measurements of cell composition. We computed the RMSE and R (2) between the predicted cell proportions and their measured values. RESET outperformed the Legacy approach in selecting libraries that improve the accuracy of deconvolution estimates. Additionally, reference libraries constructed using RESET resulted in cellular composition estimates that explained more variation in DNAm as compared to the Legacy approach when evaluated in the context of epigenome-wide association studies (EWAS) of several publicly available data sets. This finding has implications for the statistical power of EWAS. RESET combats potential challenges associated with existing approaches for reference library assembly and thus, may serve as a viable strategy for library construction in the absence of a training data set. Frontiers Media S.A. 2022-03-21 /pmc/articles/PMC9004796/ /pubmed/35419567 http://dx.doi.org/10.3389/fbinf.2022.835591 Text en Copyright © 2022 Bell-Glenn, Thompson, Salas and Koestler. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioinformatics
Bell-Glenn, Shelby
Thompson, Jeffrey A.
Salas, Lucas A.
Koestler, Devin C.
A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title_full A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title_fullStr A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title_full_unstemmed A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title_short A Novel Framework for the Identification of Reference DNA Methylation Libraries for Reference-Based Deconvolution of Cellular Mixtures
title_sort novel framework for the identification of reference dna methylation libraries for reference-based deconvolution of cellular mixtures
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9004796/
https://www.ncbi.nlm.nih.gov/pubmed/35419567
http://dx.doi.org/10.3389/fbinf.2022.835591
work_keys_str_mv AT bellglennshelby anovelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures
AT thompsonjeffreya anovelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures
AT salaslucasa anovelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures
AT koestlerdevinc anovelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures
AT bellglennshelby novelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures
AT thompsonjeffreya novelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures
AT salaslucasa novelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures
AT koestlerdevinc novelframeworkfortheidentificationofreferencednamethylationlibrariesforreferencebaseddeconvolutionofcellularmixtures