Cargando…

A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data

A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e.,...

Descripción completa

Detalles Bibliográficos
Autores principales: Rangan, Aaditya V., McGrouther, Caroline C., Kelsoe, John, Schork, Nicholas, Stahl, Eli, Zhu, Qian, Krishnan, Arjun, Yao, Vicky, Troyanskaya, Olga, Bilaloglu, Seda, Raghavan, Preeti, Bergen, Sarah, Jureus, Anders, Landen, Mikael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5997363/
https://www.ncbi.nlm.nih.gov/pubmed/29758032
http://dx.doi.org/10.1371/journal.pcbi.1006105
_version_ 1783331030242951168
author Rangan, Aaditya V.
McGrouther, Caroline C.
Kelsoe, John
Schork, Nicholas
Stahl, Eli
Zhu, Qian
Krishnan, Arjun
Yao, Vicky
Troyanskaya, Olga
Bilaloglu, Seda
Raghavan, Preeti
Bergen, Sarah
Jureus, Anders
Landen, Mikael
author_facet Rangan, Aaditya V.
McGrouther, Caroline C.
Kelsoe, John
Schork, Nicholas
Stahl, Eli
Zhu, Qian
Krishnan, Arjun
Yao, Vicky
Troyanskaya, Olga
Bilaloglu, Seda
Raghavan, Preeti
Bergen, Sarah
Jureus, Anders
Landen, Mikael
author_sort Rangan, Aaditya V.
collection PubMed
description A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., ‘loops’) within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).
format Online
Article
Text
id pubmed-5997363
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-59973632018-06-21 A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data Rangan, Aaditya V. McGrouther, Caroline C. Kelsoe, John Schork, Nicholas Stahl, Eli Zhu, Qian Krishnan, Arjun Yao, Vicky Troyanskaya, Olga Bilaloglu, Seda Raghavan, Preeti Bergen, Sarah Jureus, Anders Landen, Mikael PLoS Comput Biol Research Article A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., ‘loops’) within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS). Public Library of Science 2018-05-14 /pmc/articles/PMC5997363/ /pubmed/29758032 http://dx.doi.org/10.1371/journal.pcbi.1006105 Text en © 2018 Rangan et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Rangan, Aaditya V.
McGrouther, Caroline C.
Kelsoe, John
Schork, Nicholas
Stahl, Eli
Zhu, Qian
Krishnan, Arjun
Yao, Vicky
Troyanskaya, Olga
Bilaloglu, Seda
Raghavan, Preeti
Bergen, Sarah
Jureus, Anders
Landen, Mikael
A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data
title A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data
title_full A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data
title_fullStr A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data
title_full_unstemmed A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data
title_short A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data
title_sort loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5997363/
https://www.ncbi.nlm.nih.gov/pubmed/29758032
http://dx.doi.org/10.1371/journal.pcbi.1006105
work_keys_str_mv AT ranganaadityav aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT mcgrouthercarolinec aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT kelsoejohn aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT schorknicholas aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT stahleli aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT zhuqian aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT krishnanarjun aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT yaovicky aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT troyanskayaolga aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT bilalogluseda aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT raghavanpreeti aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT bergensarah aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT jureusanders aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT landenmikael aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT aloopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT ranganaadityav loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT mcgrouthercarolinec loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT kelsoejohn loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT schorknicholas loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT stahleli loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT zhuqian loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT krishnanarjun loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT yaovicky loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT troyanskayaolga loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT bilalogluseda loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT raghavanpreeti loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT bergensarah loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT jureusanders loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT landenmikael loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata
AT loopcountingmethodforcovariatecorrectedlowrankbiclusteringofgeneexpressionandgenomewideassociationstudydata