Cargando…

MGMR: leveraging RNA-Seq population data to optimize expression estimation

BACKGROUND: RNA-Seq is a technique that uses Next Generation Sequencing to identify transcripts and estimate transcription levels. When applying this technique for quantification, one must contend with reads that align to multiple positions in the genome (multireads). Previous efforts to resolve mul...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rozov, Roye, Halperin, Eran, Shamir, Ron
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3358656/ https://www.ncbi.nlm.nih.gov/pubmed/22537041 http://dx.doi.org/10.1186/1471-2105-13-S6-S2

_version_	1782233794614394880
author	Rozov, Roye Halperin, Eran Shamir, Ron
author_facet	Rozov, Roye Halperin, Eran Shamir, Ron
author_sort	Rozov, Roye
collection	PubMed
description	BACKGROUND: RNA-Seq is a technique that uses Next Generation Sequencing to identify transcripts and estimate transcription levels. When applying this technique for quantification, one must contend with reads that align to multiple positions in the genome (multireads). Previous efforts to resolve multireads have shown that RNA-Seq expression estimation can be improved using probabilistic allocation of reads to genes. These methods use a probabilistic generative model for data generation and resolve ambiguity using likelihood-based approaches. In many instances, RNA-seq experiments are performed in the context of a population. The generative models of current methods do not take into account such population information, and it is an open question whether this information can improve quantification of the individual samples RESULTS: In order to explore the contribution of population level information in RNA-seq quantification, we apply a hierarchical probabilistic generative model, which assumes that expression levels of different individuals are sampled from a Dirichlet distribution with parameters specific to the population, and reads are sampled from the distribution of expression levels. We introduce an optimization procedure for the estimation of the model parameters, and use HapMap data and simulated data to demonstrate that the model yields a significant improvement in the accuracy of expression levels of paralogous genes. CONCLUSIONS: We provide a proof of principal of the benefit of drawing on population commonalities to estimate expression. The results of our experiments demonstrate this approach can be beneficial, primarily for estimation at the gene level.
format	Online Article Text
id	pubmed-3358656
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-33586562012-06-07 MGMR: leveraging RNA-Seq population data to optimize expression estimation Rozov, Roye Halperin, Eran Shamir, Ron BMC Bioinformatics Proceedings BACKGROUND: RNA-Seq is a technique that uses Next Generation Sequencing to identify transcripts and estimate transcription levels. When applying this technique for quantification, one must contend with reads that align to multiple positions in the genome (multireads). Previous efforts to resolve multireads have shown that RNA-Seq expression estimation can be improved using probabilistic allocation of reads to genes. These methods use a probabilistic generative model for data generation and resolve ambiguity using likelihood-based approaches. In many instances, RNA-seq experiments are performed in the context of a population. The generative models of current methods do not take into account such population information, and it is an open question whether this information can improve quantification of the individual samples RESULTS: In order to explore the contribution of population level information in RNA-seq quantification, we apply a hierarchical probabilistic generative model, which assumes that expression levels of different individuals are sampled from a Dirichlet distribution with parameters specific to the population, and reads are sampled from the distribution of expression levels. We introduce an optimization procedure for the estimation of the model parameters, and use HapMap data and simulated data to demonstrate that the model yields a significant improvement in the accuracy of expression levels of paralogous genes. CONCLUSIONS: We provide a proof of principal of the benefit of drawing on population commonalities to estimate expression. The results of our experiments demonstrate this approach can be beneficial, primarily for estimation at the gene level. BioMed Central 2012-04-19 /pmc/articles/PMC3358656/ /pubmed/22537041 http://dx.doi.org/10.1186/1471-2105-13-S6-S2 Text en Copyright ©2012 Rozov et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Rozov, Roye Halperin, Eran Shamir, Ron MGMR: leveraging RNA-Seq population data to optimize expression estimation
title	MGMR: leveraging RNA-Seq population data to optimize expression estimation
title_full	MGMR: leveraging RNA-Seq population data to optimize expression estimation
title_fullStr	MGMR: leveraging RNA-Seq population data to optimize expression estimation
title_full_unstemmed	MGMR: leveraging RNA-Seq population data to optimize expression estimation
title_short	MGMR: leveraging RNA-Seq population data to optimize expression estimation
title_sort	mgmr: leveraging rna-seq population data to optimize expression estimation
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3358656/ https://www.ncbi.nlm.nih.gov/pubmed/22537041 http://dx.doi.org/10.1186/1471-2105-13-S6-S2
work_keys_str_mv	AT rozovroye mgmrleveragingrnaseqpopulationdatatooptimizeexpressionestimation AT halperineran mgmrleveragingrnaseqpopulationdatatooptimizeexpressionestimation AT shamirron mgmrleveragingrnaseqpopulationdatatooptimizeexpressionestimation

MGMR: leveraging RNA-Seq population data to optimize expression estimation

Ejemplares similares