Cargando…

OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices

Matrices representing genetic relatedness among individuals (i.e., Genomic Relationship Matrices, GRMs) play a central role in genetic analysis. The eigen-decomposition of GRMs (or its alternative that generates fewer top singular values using genotype matrices) is a necessary step for many analyses...

Descripción completa

Detalles Bibliográficos
Autores principales: Xiong, Zhi, Zhang, Qingrun, Platt, Alexander, Liao, Wenyuan, Shi, Xinghua, de los Campos, Gustavo, Long, Quan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6325911/
https://www.ncbi.nlm.nih.gov/pubmed/30482799
http://dx.doi.org/10.1534/g3.118.200908
_version_ 1783386217883107328
author Xiong, Zhi
Zhang, Qingrun
Platt, Alexander
Liao, Wenyuan
Shi, Xinghua
de los Campos, Gustavo
Long, Quan
author_facet Xiong, Zhi
Zhang, Qingrun
Platt, Alexander
Liao, Wenyuan
Shi, Xinghua
de los Campos, Gustavo
Long, Quan
author_sort Xiong, Zhi
collection PubMed
description Matrices representing genetic relatedness among individuals (i.e., Genomic Relationship Matrices, GRMs) play a central role in genetic analysis. The eigen-decomposition of GRMs (or its alternative that generates fewer top singular values using genotype matrices) is a necessary step for many analyses including estimation of SNP-heritability, Principal Component Analysis (PCA), and genomic prediction. However, the GRMs and genotype matrices provided by modern biobanks are too large to be stored in active memory. To accommodate the current and future “bigger-data”, we develop a disk-based tool, Out-of-Core Matrices Analyzer (OCMA), using state-of-the-art computational techniques that can nimbly perform eigen and Singular Value Decomposition (SVD) analyses. By integrating memory mapping (mmap) and the latest matrix factorization libraries, our tool is fast and memory-efficient. To demonstrate the impressive performance of OCMA, we test it on a personal computer. For full eigen-decomposition, it solves an ordinary GRM (N = 10,000) in 55 sec. For SVD, a commonly used faster alternative of full eigen-decomposition in genomic analyses, OCMA solves the top 200 singular values (SVs) in half an hour, top 2,000 SVs in 0.95 hr, and all 5,000 SVs in 1.77 hr based on a very large genotype matrix (N = 1,000,000, M = 5,000) on the same personal computer. OCMA also supports multi-threading when running in a desktop or HPC cluster. Our OCMA tool can thus alleviate the computing bottleneck of classical analyses on large genomic matrices, and make it possible to scale up current and emerging analytical methods to big genomics data using lightweight computing resources.
format Online
Article
Text
id pubmed-6325911
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-63259112019-01-10 OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices Xiong, Zhi Zhang, Qingrun Platt, Alexander Liao, Wenyuan Shi, Xinghua de los Campos, Gustavo Long, Quan G3 (Bethesda) Software and Data Resources Matrices representing genetic relatedness among individuals (i.e., Genomic Relationship Matrices, GRMs) play a central role in genetic analysis. The eigen-decomposition of GRMs (or its alternative that generates fewer top singular values using genotype matrices) is a necessary step for many analyses including estimation of SNP-heritability, Principal Component Analysis (PCA), and genomic prediction. However, the GRMs and genotype matrices provided by modern biobanks are too large to be stored in active memory. To accommodate the current and future “bigger-data”, we develop a disk-based tool, Out-of-Core Matrices Analyzer (OCMA), using state-of-the-art computational techniques that can nimbly perform eigen and Singular Value Decomposition (SVD) analyses. By integrating memory mapping (mmap) and the latest matrix factorization libraries, our tool is fast and memory-efficient. To demonstrate the impressive performance of OCMA, we test it on a personal computer. For full eigen-decomposition, it solves an ordinary GRM (N = 10,000) in 55 sec. For SVD, a commonly used faster alternative of full eigen-decomposition in genomic analyses, OCMA solves the top 200 singular values (SVs) in half an hour, top 2,000 SVs in 0.95 hr, and all 5,000 SVs in 1.77 hr based on a very large genotype matrix (N = 1,000,000, M = 5,000) on the same personal computer. OCMA also supports multi-threading when running in a desktop or HPC cluster. Our OCMA tool can thus alleviate the computing bottleneck of classical analyses on large genomic matrices, and make it possible to scale up current and emerging analytical methods to big genomics data using lightweight computing resources. Genetics Society of America 2018-11-27 /pmc/articles/PMC6325911/ /pubmed/30482799 http://dx.doi.org/10.1534/g3.118.200908 Text en Copyright © 2019 by the Genetics Society of America http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software and Data Resources
Xiong, Zhi
Zhang, Qingrun
Platt, Alexander
Liao, Wenyuan
Shi, Xinghua
de los Campos, Gustavo
Long, Quan
OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices
title OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices
title_full OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices
title_fullStr OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices
title_full_unstemmed OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices
title_short OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices
title_sort ocma: fast, memory-efficient factorization of prohibitively large relationship matrices
topic Software and Data Resources
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6325911/
https://www.ncbi.nlm.nih.gov/pubmed/30482799
http://dx.doi.org/10.1534/g3.118.200908
work_keys_str_mv AT xiongzhi ocmafastmemoryefficientfactorizationofprohibitivelylargerelationshipmatrices
AT zhangqingrun ocmafastmemoryefficientfactorizationofprohibitivelylargerelationshipmatrices
AT plattalexander ocmafastmemoryefficientfactorizationofprohibitivelylargerelationshipmatrices
AT liaowenyuan ocmafastmemoryefficientfactorizationofprohibitivelylargerelationshipmatrices
AT shixinghua ocmafastmemoryefficientfactorizationofprohibitivelylargerelationshipmatrices
AT deloscamposgustavo ocmafastmemoryefficientfactorizationofprohibitivelylargerelationshipmatrices
AT longquan ocmafastmemoryefficientfactorizationofprohibitivelylargerelationshipmatrices