Cargando…

Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge

Transcriptomic assays that measure expression levels are widely used to study the manifestation of environmental or genetic variations in cellular processes. RNA-sequencing in particular has the potential to considerably improve such understanding because of its capacity to assay the entire transcri...

Descripción completa

Detalles Bibliográficos
Autores principales: Mostafavi, Sara, Battle, Alexis, Zhu, Xiaowei, Urban, Alexander E., Levinson, Douglas, Montgomery, Stephen B., Koller, Daphne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3715474/
https://www.ncbi.nlm.nih.gov/pubmed/23874524
http://dx.doi.org/10.1371/journal.pone.0068141
_version_ 1782277462632169472
author Mostafavi, Sara
Battle, Alexis
Zhu, Xiaowei
Urban, Alexander E.
Levinson, Douglas
Montgomery, Stephen B.
Koller, Daphne
author_facet Mostafavi, Sara
Battle, Alexis
Zhu, Xiaowei
Urban, Alexander E.
Levinson, Douglas
Montgomery, Stephen B.
Koller, Daphne
author_sort Mostafavi, Sara
collection PubMed
description Transcriptomic assays that measure expression levels are widely used to study the manifestation of environmental or genetic variations in cellular processes. RNA-sequencing in particular has the potential to considerably improve such understanding because of its capacity to assay the entire transcriptome, including novel transcriptional events. However, as with earlier expression assays, analysis of RNA-sequencing data requires carefully accounting for factors that may introduce systematic, confounding variability in the expression measurements, resulting in spurious correlations. Here, we consider the problem of modeling and removing the effects of known and hidden confounding factors from RNA-sequencing data. We describe a unified residual framework that encapsulates existing approaches, and using this framework, present a novel method, HCP (Hidden Covariates with Prior). HCP uses a more informed assumption about the confounding factors, and performs as well or better than existing approaches while having a much lower computational cost. Our experiments demonstrate that accounting for known and hidden factors with appropriate models improves the quality of RNA-sequencing data in two very different tasks: detecting genetic variations that are associated with nearby expression variations (cis-eQTLs), and constructing accurate co-expression networks.
format Online
Article
Text
id pubmed-3715474
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-37154742013-07-19 Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge Mostafavi, Sara Battle, Alexis Zhu, Xiaowei Urban, Alexander E. Levinson, Douglas Montgomery, Stephen B. Koller, Daphne PLoS One Research Article Transcriptomic assays that measure expression levels are widely used to study the manifestation of environmental or genetic variations in cellular processes. RNA-sequencing in particular has the potential to considerably improve such understanding because of its capacity to assay the entire transcriptome, including novel transcriptional events. However, as with earlier expression assays, analysis of RNA-sequencing data requires carefully accounting for factors that may introduce systematic, confounding variability in the expression measurements, resulting in spurious correlations. Here, we consider the problem of modeling and removing the effects of known and hidden confounding factors from RNA-sequencing data. We describe a unified residual framework that encapsulates existing approaches, and using this framework, present a novel method, HCP (Hidden Covariates with Prior). HCP uses a more informed assumption about the confounding factors, and performs as well or better than existing approaches while having a much lower computational cost. Our experiments demonstrate that accounting for known and hidden factors with appropriate models improves the quality of RNA-sequencing data in two very different tasks: detecting genetic variations that are associated with nearby expression variations (cis-eQTLs), and constructing accurate co-expression networks. Public Library of Science 2013-07-18 /pmc/articles/PMC3715474/ /pubmed/23874524 http://dx.doi.org/10.1371/journal.pone.0068141 Text en © 2013 Mostafavi et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Mostafavi, Sara
Battle, Alexis
Zhu, Xiaowei
Urban, Alexander E.
Levinson, Douglas
Montgomery, Stephen B.
Koller, Daphne
Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge
title Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge
title_full Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge
title_fullStr Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge
title_full_unstemmed Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge
title_short Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge
title_sort normalizing rna-sequencing data by modeling hidden covariates with prior knowledge
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3715474/
https://www.ncbi.nlm.nih.gov/pubmed/23874524
http://dx.doi.org/10.1371/journal.pone.0068141
work_keys_str_mv AT mostafavisara normalizingrnasequencingdatabymodelinghiddencovariateswithpriorknowledge
AT battlealexis normalizingrnasequencingdatabymodelinghiddencovariateswithpriorknowledge
AT zhuxiaowei normalizingrnasequencingdatabymodelinghiddencovariateswithpriorknowledge
AT urbanalexandere normalizingrnasequencingdatabymodelinghiddencovariateswithpriorknowledge
AT levinsondouglas normalizingrnasequencingdatabymodelinghiddencovariateswithpriorknowledge
AT montgomerystephenb normalizingrnasequencingdatabymodelinghiddencovariateswithpriorknowledge
AT kollerdaphne normalizingrnasequencingdatabymodelinghiddencovariateswithpriorknowledge