Cargando…

Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases

In silico quantification of cell proportions from mixed-cell transcriptomics data (deconvolution) requires a reference expression matrix, called basis matrix. We hypothesize that matrices created using only healthy samples from a single microarray platform would introduce biological and technical bi...

Descripción completa

Detalles Bibliográficos
Autores principales: Vallania, Francesco, Tam, Andrew, Lofgren, Shane, Schaffert, Steven, Azad, Tej D., Bongen, Erika, Haynes, Winston, Alsup, Meia, Alonso, Michael, Davis, Mark, Engleman, Edgar, Khatri, Purvesh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6226523/
https://www.ncbi.nlm.nih.gov/pubmed/30413720
http://dx.doi.org/10.1038/s41467-018-07242-6
_version_ 1783369961243148288
author Vallania, Francesco
Tam, Andrew
Lofgren, Shane
Schaffert, Steven
Azad, Tej D.
Bongen, Erika
Haynes, Winston
Alsup, Meia
Alonso, Michael
Davis, Mark
Engleman, Edgar
Khatri, Purvesh
author_facet Vallania, Francesco
Tam, Andrew
Lofgren, Shane
Schaffert, Steven
Azad, Tej D.
Bongen, Erika
Haynes, Winston
Alsup, Meia
Alonso, Michael
Davis, Mark
Engleman, Edgar
Khatri, Purvesh
author_sort Vallania, Francesco
collection PubMed
description In silico quantification of cell proportions from mixed-cell transcriptomics data (deconvolution) requires a reference expression matrix, called basis matrix. We hypothesize that matrices created using only healthy samples from a single microarray platform would introduce biological and technical biases in deconvolution. We show presence of such biases in two existing matrices, IRIS and LM22, irrespective of deconvolution method. Here, we present immunoStates, a basis matrix built using 6160 samples with different disease states across 42 microarray platforms. We find that immunoStates significantly reduces biological and technical biases. Importantly, we find that different methods have virtually no or minimal effect once the basis matrix is chosen. We further show that cellular proportion estimates using immunoStates are consistently more correlated with measured proportions than IRIS and LM22, across all methods. Our results demonstrate the need and importance of incorporating biological and technical heterogeneity in a basis matrix for achieving consistently high accuracy.
format Online
Article
Text
id pubmed-6226523
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-62265232018-11-13 Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases Vallania, Francesco Tam, Andrew Lofgren, Shane Schaffert, Steven Azad, Tej D. Bongen, Erika Haynes, Winston Alsup, Meia Alonso, Michael Davis, Mark Engleman, Edgar Khatri, Purvesh Nat Commun Article In silico quantification of cell proportions from mixed-cell transcriptomics data (deconvolution) requires a reference expression matrix, called basis matrix. We hypothesize that matrices created using only healthy samples from a single microarray platform would introduce biological and technical biases in deconvolution. We show presence of such biases in two existing matrices, IRIS and LM22, irrespective of deconvolution method. Here, we present immunoStates, a basis matrix built using 6160 samples with different disease states across 42 microarray platforms. We find that immunoStates significantly reduces biological and technical biases. Importantly, we find that different methods have virtually no or minimal effect once the basis matrix is chosen. We further show that cellular proportion estimates using immunoStates are consistently more correlated with measured proportions than IRIS and LM22, across all methods. Our results demonstrate the need and importance of incorporating biological and technical heterogeneity in a basis matrix for achieving consistently high accuracy. Nature Publishing Group UK 2018-11-09 /pmc/articles/PMC6226523/ /pubmed/30413720 http://dx.doi.org/10.1038/s41467-018-07242-6 Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Vallania, Francesco
Tam, Andrew
Lofgren, Shane
Schaffert, Steven
Azad, Tej D.
Bongen, Erika
Haynes, Winston
Alsup, Meia
Alonso, Michael
Davis, Mark
Engleman, Edgar
Khatri, Purvesh
Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases
title Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases
title_full Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases
title_fullStr Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases
title_full_unstemmed Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases
title_short Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases
title_sort leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6226523/
https://www.ncbi.nlm.nih.gov/pubmed/30413720
http://dx.doi.org/10.1038/s41467-018-07242-6
work_keys_str_mv AT vallaniafrancesco leveragingheterogeneityacrossmultipledatasetsincreasescellmixturedeconvolutionaccuracyandreducesbiologicalandtechnicalbiases
AT tamandrew leveragingheterogeneityacrossmultipledatasetsincreasescellmixturedeconvolutionaccuracyandreducesbiologicalandtechnicalbiases
AT lofgrenshane leveragingheterogeneityacrossmultipledatasetsincreasescellmixturedeconvolutionaccuracyandreducesbiologicalandtechnicalbiases
AT schaffertsteven leveragingheterogeneityacrossmultipledatasetsincreasescellmixturedeconvolutionaccuracyandreducesbiologicalandtechnicalbiases
AT azadtejd leveragingheterogeneityacrossmultipledatasetsincreasescellmixturedeconvolutionaccuracyandreducesbiologicalandtechnicalbiases
AT bongenerika leveragingheterogeneityacrossmultipledatasetsincreasescellmixturedeconvolutionaccuracyandreducesbiologicalandtechnicalbiases
AT hayneswinston leveragingheterogeneityacrossmultipledatasetsincreasescellmixturedeconvolutionaccuracyandreducesbiologicalandtechnicalbiases
AT alsupmeia leveragingheterogeneityacrossmultipledatasetsincreasescellmixturedeconvolutionaccuracyandreducesbiologicalandtechnicalbiases
AT alonsomichael leveragingheterogeneityacrossmultipledatasetsincreasescellmixturedeconvolutionaccuracyandreducesbiologicalandtechnicalbiases
AT davismark leveragingheterogeneityacrossmultipledatasetsincreasescellmixturedeconvolutionaccuracyandreducesbiologicalandtechnicalbiases
AT englemanedgar leveragingheterogeneityacrossmultipledatasetsincreasescellmixturedeconvolutionaccuracyandreducesbiologicalandtechnicalbiases
AT khatripurvesh leveragingheterogeneityacrossmultipledatasetsincreasescellmixturedeconvolutionaccuracyandreducesbiologicalandtechnicalbiases