Cargando…

A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues

BACKGROUND: RNA-seq, a next-generation sequencing based method for transcriptome analysis, is rapidly emerging as the method of choice for comprehensive transcript abundance estimation. The accuracy of RNA-seq can be highly impacted by the purity of samples. A prominent, outstanding problem in RNA-s...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yi, Xie, Xiaohui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3622628/
https://www.ncbi.nlm.nih.gov/pubmed/23735186
http://dx.doi.org/10.1186/1471-2105-14-S5-S11
_version_ 1782265856665845760
author Li, Yi
Xie, Xiaohui
author_facet Li, Yi
Xie, Xiaohui
author_sort Li, Yi
collection PubMed
description BACKGROUND: RNA-seq, a next-generation sequencing based method for transcriptome analysis, is rapidly emerging as the method of choice for comprehensive transcript abundance estimation. The accuracy of RNA-seq can be highly impacted by the purity of samples. A prominent, outstanding problem in RNA-seq is how to estimate transcript abundances in heterogeneous tissues, where a sample is composed of more than one cell type and the inhomogeneity can substantially confound the transcript abundance estimation of each individual cell type. Although experimental methods have been proposed to dissect multiple distinct cell types, computationally "deconvoluting" heterogeneous tissues provides an attractive alternative, since it keeps the tissue sample as well as the subsequent molecular content yield intact. RESULTS: Here we propose a probabilistic model-based approach, Transcript Estimation from Mixed Tissue samples (TEMT), to estimate the transcript abundances of each cell type of interest from RNA-seq data of heterogeneous tissue samples. TEMT incorporates positional and sequence-specific biases, and its online EM algorithm only requires a runtime proportional to the data size and a small constant memory. We test the proposed method on both simulation data and recently released ENCODE data, and show that TEMT significantly outperforms current state-of-the-art methods that do not take tissue heterogeneity into account. Currently, TEMT only resolves the tissue heterogeneity resulting from two cell types, but it can be extended to handle tissue heterogeneity resulting from multi cell types. TEMT is written in python, and is freely available at https://github.com/uci-cbcl/TEMT. CONCLUSIONS: The probabilistic model-based approach proposed here provides a new method for analyzing RNA-seq data from heterogeneous tissue samples. By applying the method to both simulation data and ENCODE data, we show that explicitly accounting for tissue heterogeneity can significantly improve the accuracy of transcript abundance estimation.
format Online
Article
Text
id pubmed-3622628
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36226282013-04-15 A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues Li, Yi Xie, Xiaohui BMC Bioinformatics Proceedings BACKGROUND: RNA-seq, a next-generation sequencing based method for transcriptome analysis, is rapidly emerging as the method of choice for comprehensive transcript abundance estimation. The accuracy of RNA-seq can be highly impacted by the purity of samples. A prominent, outstanding problem in RNA-seq is how to estimate transcript abundances in heterogeneous tissues, where a sample is composed of more than one cell type and the inhomogeneity can substantially confound the transcript abundance estimation of each individual cell type. Although experimental methods have been proposed to dissect multiple distinct cell types, computationally "deconvoluting" heterogeneous tissues provides an attractive alternative, since it keeps the tissue sample as well as the subsequent molecular content yield intact. RESULTS: Here we propose a probabilistic model-based approach, Transcript Estimation from Mixed Tissue samples (TEMT), to estimate the transcript abundances of each cell type of interest from RNA-seq data of heterogeneous tissue samples. TEMT incorporates positional and sequence-specific biases, and its online EM algorithm only requires a runtime proportional to the data size and a small constant memory. We test the proposed method on both simulation data and recently released ENCODE data, and show that TEMT significantly outperforms current state-of-the-art methods that do not take tissue heterogeneity into account. Currently, TEMT only resolves the tissue heterogeneity resulting from two cell types, but it can be extended to handle tissue heterogeneity resulting from multi cell types. TEMT is written in python, and is freely available at https://github.com/uci-cbcl/TEMT. CONCLUSIONS: The probabilistic model-based approach proposed here provides a new method for analyzing RNA-seq data from heterogeneous tissue samples. By applying the method to both simulation data and ENCODE data, we show that explicitly accounting for tissue heterogeneity can significantly improve the accuracy of transcript abundance estimation. BioMed Central 2013-04-10 /pmc/articles/PMC3622628/ /pubmed/23735186 http://dx.doi.org/10.1186/1471-2105-14-S5-S11 Text en Copyright © 2013 Li and Xie; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Li, Yi
Xie, Xiaohui
A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues
title A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues
title_full A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues
title_fullStr A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues
title_full_unstemmed A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues
title_short A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues
title_sort mixture model for expression deconvolution from rna-seq in heterogeneous tissues
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3622628/
https://www.ncbi.nlm.nih.gov/pubmed/23735186
http://dx.doi.org/10.1186/1471-2105-14-S5-S11
work_keys_str_mv AT liyi amixturemodelforexpressiondeconvolutionfromrnaseqinheterogeneoustissues
AT xiexiaohui amixturemodelforexpressiondeconvolutionfromrnaseqinheterogeneoustissues
AT liyi mixturemodelforexpressiondeconvolutionfromrnaseqinheterogeneoustissues
AT xiexiaohui mixturemodelforexpressiondeconvolutionfromrnaseqinheterogeneoustissues