Cargando…

LDmat: efficiently queryable compression of linkage disequilibrium matrices

MOTIVATION: Linkage disequilibrium (LD) matrices derived from large populations are widely used in population genetics in fine-mapping, LD score regression, and linear mixed models for Genome-wide Association Studies (GWAS). However, these matrices can reach large sizes when they are derived from mi...

Descripción completa

Detalles Bibliográficos
Autores principales: Weiner, Rockwell J, Lakhani, Chirag, Knowles, David A, Gürsoy, Gamze
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9969815/
https://www.ncbi.nlm.nih.gov/pubmed/36794924
http://dx.doi.org/10.1093/bioinformatics/btad092
Descripción
Sumario:MOTIVATION: Linkage disequilibrium (LD) matrices derived from large populations are widely used in population genetics in fine-mapping, LD score regression, and linear mixed models for Genome-wide Association Studies (GWAS). However, these matrices can reach large sizes when they are derived from millions of individuals; hence, moving, sharing and extracting granular information from this large amount of data can be cumbersome. RESULTS: We sought to address the need for compressing and easily querying large LD matrices by developing LDmat. LDmat is a standalone tool to compress large LD matrices in an HDF5 file format and query these compressed matrices. It can extract submatrices corresponding to a sub-region of the genome, a list of select loci, and loci within a minor allele frequency range. LDmat can also rebuild the original file formats from the compressed files. AVAILABILITY AND IMPLEMENTATION: LDmat is implemented in python, and can be installed on Unix systems with the command ‘pip install ldmat’. It can also be accessed through https://github.com/G2Lab/ldmat and https://pypi.org/project/ldmat/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.