Cargando…
Blazing Signature Filter: a library for fast pairwise similarity comparisons
BACKGROUND: Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6047367/ https://www.ncbi.nlm.nih.gov/pubmed/29890950 http://dx.doi.org/10.1186/s12859-018-2210-6 |
_version_ | 1783339938655240192 |
---|---|
author | Lee, Joon-Yong Fujimoto, Grant M. Wilson, Ryan Wiley, H. Steven Payne, Samuel H. |
author_facet | Lee, Joon-Yong Fujimoto, Grant M. Wilson, Ryan Wiley, H. Steven Payne, Samuel H. |
author_sort | Lee, Joon-Yong |
collection | PubMed |
description | BACKGROUND: Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. RESULTS: The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. We demonstrate the utility of our algorithm using two common bioinformatics tasks: identifying data sets with similar gene expression profiles, and comparing annotated genomes. CONCLUSIONS: The BSF is a highly efficient pairwise similarity algorithm that can scale to billions of comparisons without the need for specialized hardware. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2210-6) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6047367 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-60473672018-07-19 Blazing Signature Filter: a library for fast pairwise similarity comparisons Lee, Joon-Yong Fujimoto, Grant M. Wilson, Ryan Wiley, H. Steven Payne, Samuel H. BMC Bioinformatics Software BACKGROUND: Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. RESULTS: The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. We demonstrate the utility of our algorithm using two common bioinformatics tasks: identifying data sets with similar gene expression profiles, and comparing annotated genomes. CONCLUSIONS: The BSF is a highly efficient pairwise similarity algorithm that can scale to billions of comparisons without the need for specialized hardware. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2210-6) contains supplementary material, which is available to authorized users. BioMed Central 2018-06-11 /pmc/articles/PMC6047367/ /pubmed/29890950 http://dx.doi.org/10.1186/s12859-018-2210-6 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Lee, Joon-Yong Fujimoto, Grant M. Wilson, Ryan Wiley, H. Steven Payne, Samuel H. Blazing Signature Filter: a library for fast pairwise similarity comparisons |
title | Blazing Signature Filter: a library for fast pairwise similarity comparisons |
title_full | Blazing Signature Filter: a library for fast pairwise similarity comparisons |
title_fullStr | Blazing Signature Filter: a library for fast pairwise similarity comparisons |
title_full_unstemmed | Blazing Signature Filter: a library for fast pairwise similarity comparisons |
title_short | Blazing Signature Filter: a library for fast pairwise similarity comparisons |
title_sort | blazing signature filter: a library for fast pairwise similarity comparisons |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6047367/ https://www.ncbi.nlm.nih.gov/pubmed/29890950 http://dx.doi.org/10.1186/s12859-018-2210-6 |
work_keys_str_mv | AT leejoonyong blazingsignaturefilteralibraryforfastpairwisesimilaritycomparisons AT fujimotograntm blazingsignaturefilteralibraryforfastpairwisesimilaritycomparisons AT wilsonryan blazingsignaturefilteralibraryforfastpairwisesimilaritycomparisons AT wileyhsteven blazingsignaturefilteralibraryforfastpairwisesimilaritycomparisons AT paynesamuelh blazingsignaturefilteralibraryforfastpairwisesimilaritycomparisons |