Cargando…

A novel f-divergence based generative adversarial imputation method for scRNA-seq data analysis

Comprehensive analysis of single-cell RNA sequencing (scRNA-seq) data can enhance our understanding of cellular diversity and aid in the development of personalized therapies for individuals. The abundance of missing values, known as dropouts, makes the analysis of scRNA-seq data a challenging task....

Descripción completa

Detalles Bibliográficos
Autores principales: Si, Tong, Hopkins, Zackary, Yanev, John, Hou, Jie, Gong, Haijun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10637660/
https://www.ncbi.nlm.nih.gov/pubmed/37948433
http://dx.doi.org/10.1371/journal.pone.0292792
_version_ 1785133447903182848
author Si, Tong
Hopkins, Zackary
Yanev, John
Hou, Jie
Gong, Haijun
author_facet Si, Tong
Hopkins, Zackary
Yanev, John
Hou, Jie
Gong, Haijun
author_sort Si, Tong
collection PubMed
description Comprehensive analysis of single-cell RNA sequencing (scRNA-seq) data can enhance our understanding of cellular diversity and aid in the development of personalized therapies for individuals. The abundance of missing values, known as dropouts, makes the analysis of scRNA-seq data a challenging task. Most traditional methods made assumptions about specific distributions for missing values, which limit their capability to capture the intricacy of high-dimensional scRNA-seq data. Moreover, the imputation performance of traditional methods decreases with higher missing rates. We propose a novel f-divergence based generative adversarial imputation method, called sc-fGAIN, for the scRNA-seq data imputation. Our studies identify four f-divergence functions, namely cross-entropy, Kullback-Leibler (KL), reverse KL, and Jensen-Shannon, that can be effectively integrated with the generative adversarial imputation network to generate imputed values without any assumptions, and mathematically prove that the distribution of imputed data using sc-fGAIN algorithm is same as the distribution of original data. Real scRNA-seq data analysis has shown that, compared to many traditional methods, the imputed values generated by sc-fGAIN algorithm have a smaller root-mean-square error, and it is robust to varying missing rates, moreover, it can reduce imputation variability. The flexibility offered by the f-divergence allows the sc-fGAIN method to accommodate various types of data, making it a more universal approach for imputing missing values of scRNA-seq data.
format Online
Article
Text
id pubmed-10637660
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-106376602023-11-11 A novel f-divergence based generative adversarial imputation method for scRNA-seq data analysis Si, Tong Hopkins, Zackary Yanev, John Hou, Jie Gong, Haijun PLoS One Research Article Comprehensive analysis of single-cell RNA sequencing (scRNA-seq) data can enhance our understanding of cellular diversity and aid in the development of personalized therapies for individuals. The abundance of missing values, known as dropouts, makes the analysis of scRNA-seq data a challenging task. Most traditional methods made assumptions about specific distributions for missing values, which limit their capability to capture the intricacy of high-dimensional scRNA-seq data. Moreover, the imputation performance of traditional methods decreases with higher missing rates. We propose a novel f-divergence based generative adversarial imputation method, called sc-fGAIN, for the scRNA-seq data imputation. Our studies identify four f-divergence functions, namely cross-entropy, Kullback-Leibler (KL), reverse KL, and Jensen-Shannon, that can be effectively integrated with the generative adversarial imputation network to generate imputed values without any assumptions, and mathematically prove that the distribution of imputed data using sc-fGAIN algorithm is same as the distribution of original data. Real scRNA-seq data analysis has shown that, compared to many traditional methods, the imputed values generated by sc-fGAIN algorithm have a smaller root-mean-square error, and it is robust to varying missing rates, moreover, it can reduce imputation variability. The flexibility offered by the f-divergence allows the sc-fGAIN method to accommodate various types of data, making it a more universal approach for imputing missing values of scRNA-seq data. Public Library of Science 2023-11-10 /pmc/articles/PMC10637660/ /pubmed/37948433 http://dx.doi.org/10.1371/journal.pone.0292792 Text en © 2023 Si et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Si, Tong
Hopkins, Zackary
Yanev, John
Hou, Jie
Gong, Haijun
A novel f-divergence based generative adversarial imputation method for scRNA-seq data analysis
title A novel f-divergence based generative adversarial imputation method for scRNA-seq data analysis
title_full A novel f-divergence based generative adversarial imputation method for scRNA-seq data analysis
title_fullStr A novel f-divergence based generative adversarial imputation method for scRNA-seq data analysis
title_full_unstemmed A novel f-divergence based generative adversarial imputation method for scRNA-seq data analysis
title_short A novel f-divergence based generative adversarial imputation method for scRNA-seq data analysis
title_sort novel f-divergence based generative adversarial imputation method for scrna-seq data analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10637660/
https://www.ncbi.nlm.nih.gov/pubmed/37948433
http://dx.doi.org/10.1371/journal.pone.0292792
work_keys_str_mv AT sitong anovelfdivergencebasedgenerativeadversarialimputationmethodforscrnaseqdataanalysis
AT hopkinszackary anovelfdivergencebasedgenerativeadversarialimputationmethodforscrnaseqdataanalysis
AT yanevjohn anovelfdivergencebasedgenerativeadversarialimputationmethodforscrnaseqdataanalysis
AT houjie anovelfdivergencebasedgenerativeadversarialimputationmethodforscrnaseqdataanalysis
AT gonghaijun anovelfdivergencebasedgenerativeadversarialimputationmethodforscrnaseqdataanalysis
AT sitong novelfdivergencebasedgenerativeadversarialimputationmethodforscrnaseqdataanalysis
AT hopkinszackary novelfdivergencebasedgenerativeadversarialimputationmethodforscrnaseqdataanalysis
AT yanevjohn novelfdivergencebasedgenerativeadversarialimputationmethodforscrnaseqdataanalysis
AT houjie novelfdivergencebasedgenerativeadversarialimputationmethodforscrnaseqdataanalysis
AT gonghaijun novelfdivergencebasedgenerativeadversarialimputationmethodforscrnaseqdataanalysis