Cargando…

Bayesian Correlation Analysis for Sequence Count Data

Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities’ measurements based on high-throughput sequencing data. These e...

Descripción completa

Detalles Bibliográficos
Autores principales: Sánchez-Taltavull, Daniel, Ramachandran, Parameswaran, Lau, Nelson, Perkins, Theodore J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5049778/
https://www.ncbi.nlm.nih.gov/pubmed/27701449
http://dx.doi.org/10.1371/journal.pone.0163595
_version_ 1782457777664294912
author Sánchez-Taltavull, Daniel
Ramachandran, Parameswaran
Lau, Nelson
Perkins, Theodore J.
author_facet Sánchez-Taltavull, Daniel
Ramachandran, Parameswaran
Lau, Nelson
Perkins, Theodore J.
author_sort Sánchez-Taltavull, Daniel
collection PubMed
description Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities’ measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low—especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities’ signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset.
format Online
Article
Text
id pubmed-5049778
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-50497782016-10-27 Bayesian Correlation Analysis for Sequence Count Data Sánchez-Taltavull, Daniel Ramachandran, Parameswaran Lau, Nelson Perkins, Theodore J. PLoS One Research Article Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities’ measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low—especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities’ signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset. Public Library of Science 2016-10-04 /pmc/articles/PMC5049778/ /pubmed/27701449 http://dx.doi.org/10.1371/journal.pone.0163595 Text en © 2016 Sánchez-Taltavull et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Sánchez-Taltavull, Daniel
Ramachandran, Parameswaran
Lau, Nelson
Perkins, Theodore J.
Bayesian Correlation Analysis for Sequence Count Data
title Bayesian Correlation Analysis for Sequence Count Data
title_full Bayesian Correlation Analysis for Sequence Count Data
title_fullStr Bayesian Correlation Analysis for Sequence Count Data
title_full_unstemmed Bayesian Correlation Analysis for Sequence Count Data
title_short Bayesian Correlation Analysis for Sequence Count Data
title_sort bayesian correlation analysis for sequence count data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5049778/
https://www.ncbi.nlm.nih.gov/pubmed/27701449
http://dx.doi.org/10.1371/journal.pone.0163595
work_keys_str_mv AT sancheztaltavulldaniel bayesiancorrelationanalysisforsequencecountdata
AT ramachandranparameswaran bayesiancorrelationanalysisforsequencecountdata
AT launelson bayesiancorrelationanalysisforsequencecountdata
AT perkinstheodorej bayesiancorrelationanalysisforsequencecountdata