Cargando…
Benchmarking of cell type deconvolution pipelines for transcriptomics data
Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking....
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7648640/ https://www.ncbi.nlm.nih.gov/pubmed/33159064 http://dx.doi.org/10.1038/s41467-020-19015-1 |
_version_ | 1783607151887908864 |
---|---|
author | Avila Cobos, Francisco Alquicira-Hernandez, José Powell, Joseph E. Mestdagh, Pieter De Preter, Katleen |
author_facet | Avila Cobos, Francisco Alquicira-Hernandez, José Powell, Joseph E. Mestdagh, Pieter De Preter, Katleen |
author_sort | Avila Cobos, Francisco |
collection | PubMed |
description | Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance. |
format | Online Article Text |
id | pubmed-7648640 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-76486402020-11-10 Benchmarking of cell type deconvolution pipelines for transcriptomics data Avila Cobos, Francisco Alquicira-Hernandez, José Powell, Joseph E. Mestdagh, Pieter De Preter, Katleen Nat Commun Article Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance. Nature Publishing Group UK 2020-11-06 /pmc/articles/PMC7648640/ /pubmed/33159064 http://dx.doi.org/10.1038/s41467-020-19015-1 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Avila Cobos, Francisco Alquicira-Hernandez, José Powell, Joseph E. Mestdagh, Pieter De Preter, Katleen Benchmarking of cell type deconvolution pipelines for transcriptomics data |
title | Benchmarking of cell type deconvolution pipelines for transcriptomics data |
title_full | Benchmarking of cell type deconvolution pipelines for transcriptomics data |
title_fullStr | Benchmarking of cell type deconvolution pipelines for transcriptomics data |
title_full_unstemmed | Benchmarking of cell type deconvolution pipelines for transcriptomics data |
title_short | Benchmarking of cell type deconvolution pipelines for transcriptomics data |
title_sort | benchmarking of cell type deconvolution pipelines for transcriptomics data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7648640/ https://www.ncbi.nlm.nih.gov/pubmed/33159064 http://dx.doi.org/10.1038/s41467-020-19015-1 |
work_keys_str_mv | AT avilacobosfrancisco benchmarkingofcelltypedeconvolutionpipelinesfortranscriptomicsdata AT alquicirahernandezjose benchmarkingofcelltypedeconvolutionpipelinesfortranscriptomicsdata AT powelljosephe benchmarkingofcelltypedeconvolutionpipelinesfortranscriptomicsdata AT mestdaghpieter benchmarkingofcelltypedeconvolutionpipelinesfortranscriptomicsdata AT depreterkatleen benchmarkingofcelltypedeconvolutionpipelinesfortranscriptomicsdata |