Cargando…

SIMBSIG: similarity search and clustering for biobank-scale data

SUMMARY: In many modern bioinformatics applications, such as statistical genetics, or single-cell analysis, one frequently encounters datasets which are orders of magnitude too large for conventional in-memory analysis. To tackle this challenge, we introduce SIMBSIG (SIMmilarity Batched Search Integ...

Descripción completa

Detalles Bibliográficos
Autores principales: Adamer, Michael F, Roellin, Eljas, Bourguignon, Lucie, Borgwardt, Karsten
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825260/
https://www.ncbi.nlm.nih.gov/pubmed/36610707
http://dx.doi.org/10.1093/bioinformatics/btac829
_version_ 1784866600422211584
author Adamer, Michael F
Roellin, Eljas
Bourguignon, Lucie
Borgwardt, Karsten
author_facet Adamer, Michael F
Roellin, Eljas
Bourguignon, Lucie
Borgwardt, Karsten
author_sort Adamer, Michael F
collection PubMed
description SUMMARY: In many modern bioinformatics applications, such as statistical genetics, or single-cell analysis, one frequently encounters datasets which are orders of magnitude too large for conventional in-memory analysis. To tackle this challenge, we introduce SIMBSIG (SIMmilarity Batched Search Integrated GPU), a highly scalable Python package which provides a scikit-learn-like interface for out-of-core, GPU-enabled similarity searches, principal component analysis and clustering. Due to the PyTorch backend, it is highly modular and particularly tailored to many data types with a particular focus on biobank data analysis. AVAILABILITY AND IMPLEMENTATION: SIMBSIG is freely available from PyPI and its source code and documentation can be found on GitHub (https://github.com/BorgwardtLab/simbsig) under a BSD-3 license.
format Online
Article
Text
id pubmed-9825260
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98252602023-01-09 SIMBSIG: similarity search and clustering for biobank-scale data Adamer, Michael F Roellin, Eljas Bourguignon, Lucie Borgwardt, Karsten Bioinformatics Applications Note SUMMARY: In many modern bioinformatics applications, such as statistical genetics, or single-cell analysis, one frequently encounters datasets which are orders of magnitude too large for conventional in-memory analysis. To tackle this challenge, we introduce SIMBSIG (SIMmilarity Batched Search Integrated GPU), a highly scalable Python package which provides a scikit-learn-like interface for out-of-core, GPU-enabled similarity searches, principal component analysis and clustering. Due to the PyTorch backend, it is highly modular and particularly tailored to many data types with a particular focus on biobank data analysis. AVAILABILITY AND IMPLEMENTATION: SIMBSIG is freely available from PyPI and its source code and documentation can be found on GitHub (https://github.com/BorgwardtLab/simbsig) under a BSD-3 license. Oxford University Press 2022-12-23 /pmc/articles/PMC9825260/ /pubmed/36610707 http://dx.doi.org/10.1093/bioinformatics/btac829 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Adamer, Michael F
Roellin, Eljas
Bourguignon, Lucie
Borgwardt, Karsten
SIMBSIG: similarity search and clustering for biobank-scale data
title SIMBSIG: similarity search and clustering for biobank-scale data
title_full SIMBSIG: similarity search and clustering for biobank-scale data
title_fullStr SIMBSIG: similarity search and clustering for biobank-scale data
title_full_unstemmed SIMBSIG: similarity search and clustering for biobank-scale data
title_short SIMBSIG: similarity search and clustering for biobank-scale data
title_sort simbsig: similarity search and clustering for biobank-scale data
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825260/
https://www.ncbi.nlm.nih.gov/pubmed/36610707
http://dx.doi.org/10.1093/bioinformatics/btac829
work_keys_str_mv AT adamermichaelf simbsigsimilaritysearchandclusteringforbiobankscaledata
AT roellineljas simbsigsimilaritysearchandclusteringforbiobankscaledata
AT bourguignonlucie simbsigsimilaritysearchandclusteringforbiobankscaledata
AT borgwardtkarsten simbsigsimilaritysearchandclusteringforbiobankscaledata