Cargando…

A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes

While there are >2 million publicly-available human microarray gene-expression profiles, these profiles were measured using a variety of platforms that each cover a pre-defined, limited set of genes. Therefore, key to reanalyzing and integrating this massive data collection are methods that can c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mancuso, Christopher A, Canfield, Jacob L, Singla, Deepak, Krishnan, Arjun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Computational Biology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7708069/ https://www.ncbi.nlm.nih.gov/pubmed/33074331 http://dx.doi.org/10.1093/nar/gkaa881

_version_	1783617488194371584
author	Mancuso, Christopher A Canfield, Jacob L Singla, Deepak Krishnan, Arjun
author_facet	Mancuso, Christopher A Canfield, Jacob L Singla, Deepak Krishnan, Arjun
author_sort	Mancuso, Christopher A
collection	PubMed
description	While there are >2 million publicly-available human microarray gene-expression profiles, these profiles were measured using a variety of platforms that each cover a pre-defined, limited set of genes. Therefore, key to reanalyzing and integrating this massive data collection are methods that can computationally reconstitute the complete transcriptome in partially-measured microarray samples by imputing the expression of unmeasured genes. Current state-of-the-art imputation methods are tailored to samples from a specific platform and rely on gene-gene relationships regardless of the biological context of the target sample. We show that sparse regression models that capture sample-sample relationships (termed SampleLASSO), built on-the-fly for each new target sample to be imputed, outperform models based on fixed gene relationships. Extensive evaluation involving three machine learning algorithms (LASSO, k-nearest-neighbors, and deep-neural-networks), two gene subsets (GPL96–570 and LINCS), and multiple imputation tasks (within and across microarray/RNA-seq datasets) establishes that SampleLASSO is the most accurate model. Additionally, we demonstrate the biological interpretability of this method by showing that, for imputing a target sample from a certain tissue, SampleLASSO automatically leverages training samples from the same tissue. Thus, SampleLASSO is a simple, yet powerful and flexible approach for harmonizing large-scale gene-expression data.
format	Online Article Text
id	pubmed-7708069
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-77080692020-12-07 A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes Mancuso, Christopher A Canfield, Jacob L Singla, Deepak Krishnan, Arjun Nucleic Acids Res Computational Biology While there are >2 million publicly-available human microarray gene-expression profiles, these profiles were measured using a variety of platforms that each cover a pre-defined, limited set of genes. Therefore, key to reanalyzing and integrating this massive data collection are methods that can computationally reconstitute the complete transcriptome in partially-measured microarray samples by imputing the expression of unmeasured genes. Current state-of-the-art imputation methods are tailored to samples from a specific platform and rely on gene-gene relationships regardless of the biological context of the target sample. We show that sparse regression models that capture sample-sample relationships (termed SampleLASSO), built on-the-fly for each new target sample to be imputed, outperform models based on fixed gene relationships. Extensive evaluation involving three machine learning algorithms (LASSO, k-nearest-neighbors, and deep-neural-networks), two gene subsets (GPL96–570 and LINCS), and multiple imputation tasks (within and across microarray/RNA-seq datasets) establishes that SampleLASSO is the most accurate model. Additionally, we demonstrate the biological interpretability of this method by showing that, for imputing a target sample from a certain tissue, SampleLASSO automatically leverages training samples from the same tissue. Thus, SampleLASSO is a simple, yet powerful and flexible approach for harmonizing large-scale gene-expression data. Oxford University Press 2020-10-19 /pmc/articles/PMC7708069/ /pubmed/33074331 http://dx.doi.org/10.1093/nar/gkaa881 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Computational Biology Mancuso, Christopher A Canfield, Jacob L Singla, Deepak Krishnan, Arjun A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes
title	A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes
title_full	A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes
title_fullStr	A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes
title_full_unstemmed	A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes
title_short	A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes
title_sort	flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes
topic	Computational Biology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7708069/ https://www.ncbi.nlm.nih.gov/pubmed/33074331 http://dx.doi.org/10.1093/nar/gkaa881
work_keys_str_mv	AT mancusochristophera aflexibleinterpretableandaccurateapproachforimputingtheexpressionofunmeasuredgenes AT canfieldjacobl aflexibleinterpretableandaccurateapproachforimputingtheexpressionofunmeasuredgenes AT singladeepak aflexibleinterpretableandaccurateapproachforimputingtheexpressionofunmeasuredgenes AT krishnanarjun aflexibleinterpretableandaccurateapproachforimputingtheexpressionofunmeasuredgenes AT mancusochristophera flexibleinterpretableandaccurateapproachforimputingtheexpressionofunmeasuredgenes AT canfieldjacobl flexibleinterpretableandaccurateapproachforimputingtheexpressionofunmeasuredgenes AT singladeepak flexibleinterpretableandaccurateapproachforimputingtheexpressionofunmeasuredgenes AT krishnanarjun flexibleinterpretableandaccurateapproachforimputingtheexpressionofunmeasuredgenes

A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes

Ejemplares similares