Cargando…

Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software

BACKGROUND: Cell-type heterogeneity of tumors is a key factor in tumor progression and response to chemotherapy. Tumor cell-type heterogeneity, defined as the proportion of the various cell-types in a tumor, can be inferred from DNA methylation of surgical specimens. However, confounding factors kno...

Descripción completa

Detalles Bibliográficos
Autores principales: Decamps, Clémentine, Privé, Florian, Bacher, Raphael, Jost, Daniel, Waguet, Arthur, Houseman, Eugene Andres, Lurie, Eugene, Lutsik, Pavlo, Milosavljevic, Aleksandar, Scherer, Michael, Blum, Michael G. B., Richard, Magali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6958785/
https://www.ncbi.nlm.nih.gov/pubmed/31931698
http://dx.doi.org/10.1186/s12859-019-3307-2
_version_ 1783487489116209152
author Decamps, Clémentine
Privé, Florian
Bacher, Raphael
Jost, Daniel
Waguet, Arthur
Houseman, Eugene Andres
Lurie, Eugene
Lutsik, Pavlo
Milosavljevic, Aleksandar
Scherer, Michael
Blum, Michael G. B.
Richard, Magali
author_facet Decamps, Clémentine
Privé, Florian
Bacher, Raphael
Jost, Daniel
Waguet, Arthur
Houseman, Eugene Andres
Lurie, Eugene
Lutsik, Pavlo
Milosavljevic, Aleksandar
Scherer, Michael
Blum, Michael G. B.
Richard, Magali
author_sort Decamps, Clémentine
collection PubMed
description BACKGROUND: Cell-type heterogeneity of tumors is a key factor in tumor progression and response to chemotherapy. Tumor cell-type heterogeneity, defined as the proportion of the various cell-types in a tumor, can be inferred from DNA methylation of surgical specimens. However, confounding factors known to associate with methylation values, such as age and sex, complicate accurate inference of cell-type proportions. While reference-free algorithms have been developed to infer cell-type proportions from DNA methylation, a comparative evaluation of the performance of these methods is still lacking. RESULTS: Here we use simulations to evaluate several computational pipelines based on the software packages MeDeCom, EDec, and RefFreeEWAS. We identify that accounting for confounders, feature selection, and the choice of the number of estimated cell types are critical steps for inferring cell-type proportions. We find that removal of methylation probes which are correlated with confounder variables reduces the error of inference by 30–35%, and that selection of cell-type informative probes has similar effect. We show that Cattell’s rule based on the scree plot is a powerful tool to determine the number of cell-types. Once the pre-processing steps are achieved, the three deconvolution methods provide comparable results. We observe that all the algorithms’ performance improves when inter-sample variation of cell-type proportions is large or when the number of available samples is large. We find that under specific circumstances the methods are sensitive to the initialization method, suggesting that averaging different solutions or optimizing initialization is an avenue for future research. CONCLUSION: Based on the lessons learned, to facilitate pipeline validation and catalyze further pipeline improvement by the community, we develop a benchmark pipeline for inference of cell-type proportions and implement it in the R package medepir.
format Online
Article
Text
id pubmed-6958785
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69587852020-01-17 Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software Decamps, Clémentine Privé, Florian Bacher, Raphael Jost, Daniel Waguet, Arthur Houseman, Eugene Andres Lurie, Eugene Lutsik, Pavlo Milosavljevic, Aleksandar Scherer, Michael Blum, Michael G. B. Richard, Magali BMC Bioinformatics Methodology Article BACKGROUND: Cell-type heterogeneity of tumors is a key factor in tumor progression and response to chemotherapy. Tumor cell-type heterogeneity, defined as the proportion of the various cell-types in a tumor, can be inferred from DNA methylation of surgical specimens. However, confounding factors known to associate with methylation values, such as age and sex, complicate accurate inference of cell-type proportions. While reference-free algorithms have been developed to infer cell-type proportions from DNA methylation, a comparative evaluation of the performance of these methods is still lacking. RESULTS: Here we use simulations to evaluate several computational pipelines based on the software packages MeDeCom, EDec, and RefFreeEWAS. We identify that accounting for confounders, feature selection, and the choice of the number of estimated cell types are critical steps for inferring cell-type proportions. We find that removal of methylation probes which are correlated with confounder variables reduces the error of inference by 30–35%, and that selection of cell-type informative probes has similar effect. We show that Cattell’s rule based on the scree plot is a powerful tool to determine the number of cell-types. Once the pre-processing steps are achieved, the three deconvolution methods provide comparable results. We observe that all the algorithms’ performance improves when inter-sample variation of cell-type proportions is large or when the number of available samples is large. We find that under specific circumstances the methods are sensitive to the initialization method, suggesting that averaging different solutions or optimizing initialization is an avenue for future research. CONCLUSION: Based on the lessons learned, to facilitate pipeline validation and catalyze further pipeline improvement by the community, we develop a benchmark pipeline for inference of cell-type proportions and implement it in the R package medepir. BioMed Central 2020-01-13 /pmc/articles/PMC6958785/ /pubmed/31931698 http://dx.doi.org/10.1186/s12859-019-3307-2 Text en © The Author(s). 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Decamps, Clémentine
Privé, Florian
Bacher, Raphael
Jost, Daniel
Waguet, Arthur
Houseman, Eugene Andres
Lurie, Eugene
Lutsik, Pavlo
Milosavljevic, Aleksandar
Scherer, Michael
Blum, Michael G. B.
Richard, Magali
Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software
title Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software
title_full Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software
title_fullStr Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software
title_full_unstemmed Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software
title_short Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software
title_sort guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free dna methylation deconvolution software
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6958785/
https://www.ncbi.nlm.nih.gov/pubmed/31931698
http://dx.doi.org/10.1186/s12859-019-3307-2
work_keys_str_mv AT decampsclementine guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware
AT priveflorian guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware
AT bacherraphael guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware
AT jostdaniel guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware
AT waguetarthur guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware
AT guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware
AT housemaneugeneandres guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware
AT lurieeugene guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware
AT lutsikpavlo guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware
AT milosavljevicaleksandar guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware
AT scherermichael guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware
AT blummichaelgb guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware
AT richardmagali guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware