Cargando…
SIMBSIG: similarity search and clustering for biobank-scale data
SUMMARY: In many modern bioinformatics applications, such as statistical genetics, or single-cell analysis, one frequently encounters datasets which are orders of magnitude too large for conventional in-memory analysis. To tackle this challenge, we introduce SIMBSIG (SIMmilarity Batched Search Integ...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825260/ https://www.ncbi.nlm.nih.gov/pubmed/36610707 http://dx.doi.org/10.1093/bioinformatics/btac829 |
_version_ | 1784866600422211584 |
---|---|
author | Adamer, Michael F Roellin, Eljas Bourguignon, Lucie Borgwardt, Karsten |
author_facet | Adamer, Michael F Roellin, Eljas Bourguignon, Lucie Borgwardt, Karsten |
author_sort | Adamer, Michael F |
collection | PubMed |
description | SUMMARY: In many modern bioinformatics applications, such as statistical genetics, or single-cell analysis, one frequently encounters datasets which are orders of magnitude too large for conventional in-memory analysis. To tackle this challenge, we introduce SIMBSIG (SIMmilarity Batched Search Integrated GPU), a highly scalable Python package which provides a scikit-learn-like interface for out-of-core, GPU-enabled similarity searches, principal component analysis and clustering. Due to the PyTorch backend, it is highly modular and particularly tailored to many data types with a particular focus on biobank data analysis. AVAILABILITY AND IMPLEMENTATION: SIMBSIG is freely available from PyPI and its source code and documentation can be found on GitHub (https://github.com/BorgwardtLab/simbsig) under a BSD-3 license. |
format | Online Article Text |
id | pubmed-9825260 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-98252602023-01-09 SIMBSIG: similarity search and clustering for biobank-scale data Adamer, Michael F Roellin, Eljas Bourguignon, Lucie Borgwardt, Karsten Bioinformatics Applications Note SUMMARY: In many modern bioinformatics applications, such as statistical genetics, or single-cell analysis, one frequently encounters datasets which are orders of magnitude too large for conventional in-memory analysis. To tackle this challenge, we introduce SIMBSIG (SIMmilarity Batched Search Integrated GPU), a highly scalable Python package which provides a scikit-learn-like interface for out-of-core, GPU-enabled similarity searches, principal component analysis and clustering. Due to the PyTorch backend, it is highly modular and particularly tailored to many data types with a particular focus on biobank data analysis. AVAILABILITY AND IMPLEMENTATION: SIMBSIG is freely available from PyPI and its source code and documentation can be found on GitHub (https://github.com/BorgwardtLab/simbsig) under a BSD-3 license. Oxford University Press 2022-12-23 /pmc/articles/PMC9825260/ /pubmed/36610707 http://dx.doi.org/10.1093/bioinformatics/btac829 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Note Adamer, Michael F Roellin, Eljas Bourguignon, Lucie Borgwardt, Karsten SIMBSIG: similarity search and clustering for biobank-scale data |
title | SIMBSIG: similarity search and clustering for biobank-scale data |
title_full | SIMBSIG: similarity search and clustering for biobank-scale data |
title_fullStr | SIMBSIG: similarity search and clustering for biobank-scale data |
title_full_unstemmed | SIMBSIG: similarity search and clustering for biobank-scale data |
title_short | SIMBSIG: similarity search and clustering for biobank-scale data |
title_sort | simbsig: similarity search and clustering for biobank-scale data |
topic | Applications Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825260/ https://www.ncbi.nlm.nih.gov/pubmed/36610707 http://dx.doi.org/10.1093/bioinformatics/btac829 |
work_keys_str_mv | AT adamermichaelf simbsigsimilaritysearchandclusteringforbiobankscaledata AT roellineljas simbsigsimilaritysearchandclusteringforbiobankscaledata AT bourguignonlucie simbsigsimilaritysearchandclusteringforbiobankscaledata AT borgwardtkarsten simbsigsimilaritysearchandclusteringforbiobankscaledata |