Cargando…

Benchmarking of cell type deconvolution pipelines for transcriptomics data

Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking....

Descripción completa

Detalles Bibliográficos
Autores principales: Avila Cobos, Francisco, Alquicira-Hernandez, José, Powell, Joseph E., Mestdagh, Pieter, De Preter, Katleen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7648640/
https://www.ncbi.nlm.nih.gov/pubmed/33159064
http://dx.doi.org/10.1038/s41467-020-19015-1
_version_ 1783607151887908864
author Avila Cobos, Francisco
Alquicira-Hernandez, José
Powell, Joseph E.
Mestdagh, Pieter
De Preter, Katleen
author_facet Avila Cobos, Francisco
Alquicira-Hernandez, José
Powell, Joseph E.
Mestdagh, Pieter
De Preter, Katleen
author_sort Avila Cobos, Francisco
collection PubMed
description Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance.
format Online
Article
Text
id pubmed-7648640
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-76486402020-11-10 Benchmarking of cell type deconvolution pipelines for transcriptomics data Avila Cobos, Francisco Alquicira-Hernandez, José Powell, Joseph E. Mestdagh, Pieter De Preter, Katleen Nat Commun Article Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance. Nature Publishing Group UK 2020-11-06 /pmc/articles/PMC7648640/ /pubmed/33159064 http://dx.doi.org/10.1038/s41467-020-19015-1 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Avila Cobos, Francisco
Alquicira-Hernandez, José
Powell, Joseph E.
Mestdagh, Pieter
De Preter, Katleen
Benchmarking of cell type deconvolution pipelines for transcriptomics data
title Benchmarking of cell type deconvolution pipelines for transcriptomics data
title_full Benchmarking of cell type deconvolution pipelines for transcriptomics data
title_fullStr Benchmarking of cell type deconvolution pipelines for transcriptomics data
title_full_unstemmed Benchmarking of cell type deconvolution pipelines for transcriptomics data
title_short Benchmarking of cell type deconvolution pipelines for transcriptomics data
title_sort benchmarking of cell type deconvolution pipelines for transcriptomics data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7648640/
https://www.ncbi.nlm.nih.gov/pubmed/33159064
http://dx.doi.org/10.1038/s41467-020-19015-1
work_keys_str_mv AT avilacobosfrancisco benchmarkingofcelltypedeconvolutionpipelinesfortranscriptomicsdata
AT alquicirahernandezjose benchmarkingofcelltypedeconvolutionpipelinesfortranscriptomicsdata
AT powelljosephe benchmarkingofcelltypedeconvolutionpipelinesfortranscriptomicsdata
AT mestdaghpieter benchmarkingofcelltypedeconvolutionpipelinesfortranscriptomicsdata
AT depreterkatleen benchmarkingofcelltypedeconvolutionpipelinesfortranscriptomicsdata