Cargando…
A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues
BACKGROUND: RNA-seq, a next-generation sequencing based method for transcriptome analysis, is rapidly emerging as the method of choice for comprehensive transcript abundance estimation. The accuracy of RNA-seq can be highly impacted by the purity of samples. A prominent, outstanding problem in RNA-s...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3622628/ https://www.ncbi.nlm.nih.gov/pubmed/23735186 http://dx.doi.org/10.1186/1471-2105-14-S5-S11 |
_version_ | 1782265856665845760 |
---|---|
author | Li, Yi Xie, Xiaohui |
author_facet | Li, Yi Xie, Xiaohui |
author_sort | Li, Yi |
collection | PubMed |
description | BACKGROUND: RNA-seq, a next-generation sequencing based method for transcriptome analysis, is rapidly emerging as the method of choice for comprehensive transcript abundance estimation. The accuracy of RNA-seq can be highly impacted by the purity of samples. A prominent, outstanding problem in RNA-seq is how to estimate transcript abundances in heterogeneous tissues, where a sample is composed of more than one cell type and the inhomogeneity can substantially confound the transcript abundance estimation of each individual cell type. Although experimental methods have been proposed to dissect multiple distinct cell types, computationally "deconvoluting" heterogeneous tissues provides an attractive alternative, since it keeps the tissue sample as well as the subsequent molecular content yield intact. RESULTS: Here we propose a probabilistic model-based approach, Transcript Estimation from Mixed Tissue samples (TEMT), to estimate the transcript abundances of each cell type of interest from RNA-seq data of heterogeneous tissue samples. TEMT incorporates positional and sequence-specific biases, and its online EM algorithm only requires a runtime proportional to the data size and a small constant memory. We test the proposed method on both simulation data and recently released ENCODE data, and show that TEMT significantly outperforms current state-of-the-art methods that do not take tissue heterogeneity into account. Currently, TEMT only resolves the tissue heterogeneity resulting from two cell types, but it can be extended to handle tissue heterogeneity resulting from multi cell types. TEMT is written in python, and is freely available at https://github.com/uci-cbcl/TEMT. CONCLUSIONS: The probabilistic model-based approach proposed here provides a new method for analyzing RNA-seq data from heterogeneous tissue samples. By applying the method to both simulation data and ENCODE data, we show that explicitly accounting for tissue heterogeneity can significantly improve the accuracy of transcript abundance estimation. |
format | Online Article Text |
id | pubmed-3622628 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-36226282013-04-15 A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues Li, Yi Xie, Xiaohui BMC Bioinformatics Proceedings BACKGROUND: RNA-seq, a next-generation sequencing based method for transcriptome analysis, is rapidly emerging as the method of choice for comprehensive transcript abundance estimation. The accuracy of RNA-seq can be highly impacted by the purity of samples. A prominent, outstanding problem in RNA-seq is how to estimate transcript abundances in heterogeneous tissues, where a sample is composed of more than one cell type and the inhomogeneity can substantially confound the transcript abundance estimation of each individual cell type. Although experimental methods have been proposed to dissect multiple distinct cell types, computationally "deconvoluting" heterogeneous tissues provides an attractive alternative, since it keeps the tissue sample as well as the subsequent molecular content yield intact. RESULTS: Here we propose a probabilistic model-based approach, Transcript Estimation from Mixed Tissue samples (TEMT), to estimate the transcript abundances of each cell type of interest from RNA-seq data of heterogeneous tissue samples. TEMT incorporates positional and sequence-specific biases, and its online EM algorithm only requires a runtime proportional to the data size and a small constant memory. We test the proposed method on both simulation data and recently released ENCODE data, and show that TEMT significantly outperforms current state-of-the-art methods that do not take tissue heterogeneity into account. Currently, TEMT only resolves the tissue heterogeneity resulting from two cell types, but it can be extended to handle tissue heterogeneity resulting from multi cell types. TEMT is written in python, and is freely available at https://github.com/uci-cbcl/TEMT. CONCLUSIONS: The probabilistic model-based approach proposed here provides a new method for analyzing RNA-seq data from heterogeneous tissue samples. By applying the method to both simulation data and ENCODE data, we show that explicitly accounting for tissue heterogeneity can significantly improve the accuracy of transcript abundance estimation. BioMed Central 2013-04-10 /pmc/articles/PMC3622628/ /pubmed/23735186 http://dx.doi.org/10.1186/1471-2105-14-S5-S11 Text en Copyright © 2013 Li and Xie; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Li, Yi Xie, Xiaohui A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues |
title | A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues |
title_full | A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues |
title_fullStr | A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues |
title_full_unstemmed | A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues |
title_short | A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues |
title_sort | mixture model for expression deconvolution from rna-seq in heterogeneous tissues |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3622628/ https://www.ncbi.nlm.nih.gov/pubmed/23735186 http://dx.doi.org/10.1186/1471-2105-14-S5-S11 |
work_keys_str_mv | AT liyi amixturemodelforexpressiondeconvolutionfromrnaseqinheterogeneoustissues AT xiexiaohui amixturemodelforexpressiondeconvolutionfromrnaseqinheterogeneoustissues AT liyi mixturemodelforexpressiondeconvolutionfromrnaseqinheterogeneoustissues AT xiexiaohui mixturemodelforexpressiondeconvolutionfromrnaseqinheterogeneoustissues |