Cargando…

SIMBSIG: similarity search and clustering for biobank-scale data

SUMMARY: In many modern bioinformatics applications, such as statistical genetics, or single-cell analysis, one frequently encounters datasets which are orders of magnitude too large for conventional in-memory analysis. To tackle this challenge, we introduce SIMBSIG (SIMmilarity Batched Search Integ...

Descripción completa

Detalles Bibliográficos
Autores principales:	Adamer, Michael F, Roellin, Eljas, Bourguignon, Lucie, Borgwardt, Karsten
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Applications Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825260/ https://www.ncbi.nlm.nih.gov/pubmed/36610707 http://dx.doi.org/10.1093/bioinformatics/btac829

Descripción
Sumario:	SUMMARY: In many modern bioinformatics applications, such as statistical genetics, or single-cell analysis, one frequently encounters datasets which are orders of magnitude too large for conventional in-memory analysis. To tackle this challenge, we introduce SIMBSIG (SIMmilarity Batched Search Integrated GPU), a highly scalable Python package which provides a scikit-learn-like interface for out-of-core, GPU-enabled similarity searches, principal component analysis and clustering. Due to the PyTorch backend, it is highly modular and particularly tailored to many data types with a particular focus on biobank data analysis. AVAILABILITY AND IMPLEMENTATION: SIMBSIG is freely available from PyPI and its source code and documentation can be found on GitHub (https://github.com/BorgwardtLab/simbsig) under a BSD-3 license.

SIMBSIG: similarity search and clustering for biobank-scale data

Ejemplares similares