Cargando…
Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software
BACKGROUND: Cell-type heterogeneity of tumors is a key factor in tumor progression and response to chemotherapy. Tumor cell-type heterogeneity, defined as the proportion of the various cell-types in a tumor, can be inferred from DNA methylation of surgical specimens. However, confounding factors kno...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6958785/ https://www.ncbi.nlm.nih.gov/pubmed/31931698 http://dx.doi.org/10.1186/s12859-019-3307-2 |
_version_ | 1783487489116209152 |
---|---|
author | Decamps, Clémentine Privé, Florian Bacher, Raphael Jost, Daniel Waguet, Arthur Houseman, Eugene Andres Lurie, Eugene Lutsik, Pavlo Milosavljevic, Aleksandar Scherer, Michael Blum, Michael G. B. Richard, Magali |
author_facet | Decamps, Clémentine Privé, Florian Bacher, Raphael Jost, Daniel Waguet, Arthur Houseman, Eugene Andres Lurie, Eugene Lutsik, Pavlo Milosavljevic, Aleksandar Scherer, Michael Blum, Michael G. B. Richard, Magali |
author_sort | Decamps, Clémentine |
collection | PubMed |
description | BACKGROUND: Cell-type heterogeneity of tumors is a key factor in tumor progression and response to chemotherapy. Tumor cell-type heterogeneity, defined as the proportion of the various cell-types in a tumor, can be inferred from DNA methylation of surgical specimens. However, confounding factors known to associate with methylation values, such as age and sex, complicate accurate inference of cell-type proportions. While reference-free algorithms have been developed to infer cell-type proportions from DNA methylation, a comparative evaluation of the performance of these methods is still lacking. RESULTS: Here we use simulations to evaluate several computational pipelines based on the software packages MeDeCom, EDec, and RefFreeEWAS. We identify that accounting for confounders, feature selection, and the choice of the number of estimated cell types are critical steps for inferring cell-type proportions. We find that removal of methylation probes which are correlated with confounder variables reduces the error of inference by 30–35%, and that selection of cell-type informative probes has similar effect. We show that Cattell’s rule based on the scree plot is a powerful tool to determine the number of cell-types. Once the pre-processing steps are achieved, the three deconvolution methods provide comparable results. We observe that all the algorithms’ performance improves when inter-sample variation of cell-type proportions is large or when the number of available samples is large. We find that under specific circumstances the methods are sensitive to the initialization method, suggesting that averaging different solutions or optimizing initialization is an avenue for future research. CONCLUSION: Based on the lessons learned, to facilitate pipeline validation and catalyze further pipeline improvement by the community, we develop a benchmark pipeline for inference of cell-type proportions and implement it in the R package medepir. |
format | Online Article Text |
id | pubmed-6958785 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69587852020-01-17 Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software Decamps, Clémentine Privé, Florian Bacher, Raphael Jost, Daniel Waguet, Arthur Houseman, Eugene Andres Lurie, Eugene Lutsik, Pavlo Milosavljevic, Aleksandar Scherer, Michael Blum, Michael G. B. Richard, Magali BMC Bioinformatics Methodology Article BACKGROUND: Cell-type heterogeneity of tumors is a key factor in tumor progression and response to chemotherapy. Tumor cell-type heterogeneity, defined as the proportion of the various cell-types in a tumor, can be inferred from DNA methylation of surgical specimens. However, confounding factors known to associate with methylation values, such as age and sex, complicate accurate inference of cell-type proportions. While reference-free algorithms have been developed to infer cell-type proportions from DNA methylation, a comparative evaluation of the performance of these methods is still lacking. RESULTS: Here we use simulations to evaluate several computational pipelines based on the software packages MeDeCom, EDec, and RefFreeEWAS. We identify that accounting for confounders, feature selection, and the choice of the number of estimated cell types are critical steps for inferring cell-type proportions. We find that removal of methylation probes which are correlated with confounder variables reduces the error of inference by 30–35%, and that selection of cell-type informative probes has similar effect. We show that Cattell’s rule based on the scree plot is a powerful tool to determine the number of cell-types. Once the pre-processing steps are achieved, the three deconvolution methods provide comparable results. We observe that all the algorithms’ performance improves when inter-sample variation of cell-type proportions is large or when the number of available samples is large. We find that under specific circumstances the methods are sensitive to the initialization method, suggesting that averaging different solutions or optimizing initialization is an avenue for future research. CONCLUSION: Based on the lessons learned, to facilitate pipeline validation and catalyze further pipeline improvement by the community, we develop a benchmark pipeline for inference of cell-type proportions and implement it in the R package medepir. BioMed Central 2020-01-13 /pmc/articles/PMC6958785/ /pubmed/31931698 http://dx.doi.org/10.1186/s12859-019-3307-2 Text en © The Author(s). 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Decamps, Clémentine Privé, Florian Bacher, Raphael Jost, Daniel Waguet, Arthur Houseman, Eugene Andres Lurie, Eugene Lutsik, Pavlo Milosavljevic, Aleksandar Scherer, Michael Blum, Michael G. B. Richard, Magali Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software |
title | Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software |
title_full | Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software |
title_fullStr | Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software |
title_full_unstemmed | Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software |
title_short | Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software |
title_sort | guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free dna methylation deconvolution software |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6958785/ https://www.ncbi.nlm.nih.gov/pubmed/31931698 http://dx.doi.org/10.1186/s12859-019-3307-2 |
work_keys_str_mv | AT decampsclementine guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware AT priveflorian guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware AT bacherraphael guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware AT jostdaniel guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware AT waguetarthur guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware AT guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware AT housemaneugeneandres guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware AT lurieeugene guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware AT lutsikpavlo guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware AT milosavljevicaleksandar guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware AT scherermichael guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware AT blummichaelgb guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware AT richardmagali guidelinesforcelltypeheterogeneityquantificationbasedonacomparativeanalysisofreferencefreednamethylationdeconvolutionsoftware |