Cargando…
A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data
A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e.,...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5997363/ https://www.ncbi.nlm.nih.gov/pubmed/29758032 http://dx.doi.org/10.1371/journal.pcbi.1006105 |
_version_ | 1783331030242951168 |
---|---|
author | Rangan, Aaditya V. McGrouther, Caroline C. Kelsoe, John Schork, Nicholas Stahl, Eli Zhu, Qian Krishnan, Arjun Yao, Vicky Troyanskaya, Olga Bilaloglu, Seda Raghavan, Preeti Bergen, Sarah Jureus, Anders Landen, Mikael |
author_facet | Rangan, Aaditya V. McGrouther, Caroline C. Kelsoe, John Schork, Nicholas Stahl, Eli Zhu, Qian Krishnan, Arjun Yao, Vicky Troyanskaya, Olga Bilaloglu, Seda Raghavan, Preeti Bergen, Sarah Jureus, Anders Landen, Mikael |
author_sort | Rangan, Aaditya V. |
collection | PubMed |
description | A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., ‘loops’) within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS). |
format | Online Article Text |
id | pubmed-5997363 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-59973632018-06-21 A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data Rangan, Aaditya V. McGrouther, Caroline C. Kelsoe, John Schork, Nicholas Stahl, Eli Zhu, Qian Krishnan, Arjun Yao, Vicky Troyanskaya, Olga Bilaloglu, Seda Raghavan, Preeti Bergen, Sarah Jureus, Anders Landen, Mikael PLoS Comput Biol Research Article A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., ‘loops’) within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS). Public Library of Science 2018-05-14 /pmc/articles/PMC5997363/ /pubmed/29758032 http://dx.doi.org/10.1371/journal.pcbi.1006105 Text en © 2018 Rangan et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Rangan, Aaditya V. McGrouther, Caroline C. Kelsoe, John Schork, Nicholas Stahl, Eli Zhu, Qian Krishnan, Arjun Yao, Vicky Troyanskaya, Olga Bilaloglu, Seda Raghavan, Preeti Bergen, Sarah Jureus, Anders Landen, Mikael A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data |
title | A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data |
title_full | A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data |
title_fullStr | A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data |
title_full_unstemmed | A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data |
title_short | A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data |
title_sort | loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5997363/ https://www.ncbi.nlm.nih.gov/pubmed/29758032 http://dx.doi.org/10.1371/journal.pcbi.1006105 |
work_keys_str_mv | AT ranganaadityav aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT mcgrouthercarolinec aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT kelsoejohn aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT schorknicholas aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT stahleli aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT zhuqian aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT krishnanarjun aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT yaovicky aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT troyanskayaolga aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT bilalogluseda aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT raghavanpreeti aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT bergensarah aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT jureusanders aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT landenmikael aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT ranganaadityav loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT mcgrouthercarolinec loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT kelsoejohn loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT schorknicholas loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT stahleli loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT zhuqian loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT krishnanarjun loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT yaovicky loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT troyanskayaolga loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT bilalogluseda loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT raghavanpreeti loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT bergensarah loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT jureusanders loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT landenmikael loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata AT loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata |