Cargando…

Blazing Signature Filter: a library for fast pairwise similarity comparisons

BACKGROUND: Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Joon-Yong, Fujimoto, Grant M., Wilson, Ryan, Wiley, H. Steven, Payne, Samuel H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6047367/
https://www.ncbi.nlm.nih.gov/pubmed/29890950
http://dx.doi.org/10.1186/s12859-018-2210-6
Descripción
Sumario:BACKGROUND: Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. RESULTS: The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. We demonstrate the utility of our algorithm using two common bioinformatics tasks: identifying data sets with similar gene expression profiles, and comparing annotated genomes. CONCLUSIONS: The BSF is a highly efficient pairwise similarity algorithm that can scale to billions of comparisons without the need for specialized hardware. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2210-6) contains supplementary material, which is available to authorized users.