Cargando…
OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices
Matrices representing genetic relatedness among individuals (i.e., Genomic Relationship Matrices, GRMs) play a central role in genetic analysis. The eigen-decomposition of GRMs (or its alternative that generates fewer top singular values using genotype matrices) is a necessary step for many analyses...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Genetics Society of America
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6325911/ https://www.ncbi.nlm.nih.gov/pubmed/30482799 http://dx.doi.org/10.1534/g3.118.200908 |
_version_ | 1783386217883107328 |
---|---|
author | Xiong, Zhi Zhang, Qingrun Platt, Alexander Liao, Wenyuan Shi, Xinghua de los Campos, Gustavo Long, Quan |
author_facet | Xiong, Zhi Zhang, Qingrun Platt, Alexander Liao, Wenyuan Shi, Xinghua de los Campos, Gustavo Long, Quan |
author_sort | Xiong, Zhi |
collection | PubMed |
description | Matrices representing genetic relatedness among individuals (i.e., Genomic Relationship Matrices, GRMs) play a central role in genetic analysis. The eigen-decomposition of GRMs (or its alternative that generates fewer top singular values using genotype matrices) is a necessary step for many analyses including estimation of SNP-heritability, Principal Component Analysis (PCA), and genomic prediction. However, the GRMs and genotype matrices provided by modern biobanks are too large to be stored in active memory. To accommodate the current and future “bigger-data”, we develop a disk-based tool, Out-of-Core Matrices Analyzer (OCMA), using state-of-the-art computational techniques that can nimbly perform eigen and Singular Value Decomposition (SVD) analyses. By integrating memory mapping (mmap) and the latest matrix factorization libraries, our tool is fast and memory-efficient. To demonstrate the impressive performance of OCMA, we test it on a personal computer. For full eigen-decomposition, it solves an ordinary GRM (N = 10,000) in 55 sec. For SVD, a commonly used faster alternative of full eigen-decomposition in genomic analyses, OCMA solves the top 200 singular values (SVs) in half an hour, top 2,000 SVs in 0.95 hr, and all 5,000 SVs in 1.77 hr based on a very large genotype matrix (N = 1,000,000, M = 5,000) on the same personal computer. OCMA also supports multi-threading when running in a desktop or HPC cluster. Our OCMA tool can thus alleviate the computing bottleneck of classical analyses on large genomic matrices, and make it possible to scale up current and emerging analytical methods to big genomics data using lightweight computing resources. |
format | Online Article Text |
id | pubmed-6325911 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Genetics Society of America |
record_format | MEDLINE/PubMed |
spelling | pubmed-63259112019-01-10 OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices Xiong, Zhi Zhang, Qingrun Platt, Alexander Liao, Wenyuan Shi, Xinghua de los Campos, Gustavo Long, Quan G3 (Bethesda) Software and Data Resources Matrices representing genetic relatedness among individuals (i.e., Genomic Relationship Matrices, GRMs) play a central role in genetic analysis. The eigen-decomposition of GRMs (or its alternative that generates fewer top singular values using genotype matrices) is a necessary step for many analyses including estimation of SNP-heritability, Principal Component Analysis (PCA), and genomic prediction. However, the GRMs and genotype matrices provided by modern biobanks are too large to be stored in active memory. To accommodate the current and future “bigger-data”, we develop a disk-based tool, Out-of-Core Matrices Analyzer (OCMA), using state-of-the-art computational techniques that can nimbly perform eigen and Singular Value Decomposition (SVD) analyses. By integrating memory mapping (mmap) and the latest matrix factorization libraries, our tool is fast and memory-efficient. To demonstrate the impressive performance of OCMA, we test it on a personal computer. For full eigen-decomposition, it solves an ordinary GRM (N = 10,000) in 55 sec. For SVD, a commonly used faster alternative of full eigen-decomposition in genomic analyses, OCMA solves the top 200 singular values (SVs) in half an hour, top 2,000 SVs in 0.95 hr, and all 5,000 SVs in 1.77 hr based on a very large genotype matrix (N = 1,000,000, M = 5,000) on the same personal computer. OCMA also supports multi-threading when running in a desktop or HPC cluster. Our OCMA tool can thus alleviate the computing bottleneck of classical analyses on large genomic matrices, and make it possible to scale up current and emerging analytical methods to big genomics data using lightweight computing resources. Genetics Society of America 2018-11-27 /pmc/articles/PMC6325911/ /pubmed/30482799 http://dx.doi.org/10.1534/g3.118.200908 Text en Copyright © 2019 by the Genetics Society of America http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software and Data Resources Xiong, Zhi Zhang, Qingrun Platt, Alexander Liao, Wenyuan Shi, Xinghua de los Campos, Gustavo Long, Quan OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices |
title | OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices |
title_full | OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices |
title_fullStr | OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices |
title_full_unstemmed | OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices |
title_short | OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices |
title_sort | ocma: fast, memory-efficient factorization of prohibitively large relationship matrices |
topic | Software and Data Resources |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6325911/ https://www.ncbi.nlm.nih.gov/pubmed/30482799 http://dx.doi.org/10.1534/g3.118.200908 |
work_keys_str_mv | AT xiongzhi ocmafastmemoryefficientfactorizationofprohibitivelylargerelationshipmatrices AT zhangqingrun ocmafastmemoryefficientfactorizationofprohibitivelylargerelationshipmatrices AT plattalexander ocmafastmemoryefficientfactorizationofprohibitivelylargerelationshipmatrices AT liaowenyuan ocmafastmemoryefficientfactorizationofprohibitivelylargerelationshipmatrices AT shixinghua ocmafastmemoryefficientfactorizationofprohibitivelylargerelationshipmatrices AT deloscamposgustavo ocmafastmemoryefficientfactorizationofprohibitivelylargerelationshipmatrices AT longquan ocmafastmemoryefficientfactorizationofprohibitivelylargerelationshipmatrices |