Cargando…

ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems

Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, a parallel tool to accelerate the search of interesting biclusters on binary datasets, whic...

Descripción completa

Detalles Bibliográficos
Autores principales: González-Domínguez, Jorge, Expósito, Roberto R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5880350/
https://www.ncbi.nlm.nih.gov/pubmed/29608567
http://dx.doi.org/10.1371/journal.pone.0194361
_version_ 1783311141975359488
author González-Domínguez, Jorge
Expósito, Roberto R.
author_facet González-Domínguez, Jorge
Expósito, Roberto R.
author_sort González-Domínguez, Jorge
collection PubMed
description Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, a parallel tool to accelerate the search of interesting biclusters on binary datasets, which are very popular on different fields such as genetics, marketing or text mining. It is based on the state-of-the-art sequential Java tool BiBit, which has been proved accurate by several studies, especially on scenarios that result on many large biclusters. ParBiBit uses the same methodology as BiBit (grouping the binary information into patterns) and provides the same results. Nevertheless, our tool significantly improves performance thanks to an efficient implementation based on C++11 that includes support for threads and MPI processes in order to exploit the compute capabilities of modern distributed-memory systems, which provide several multicore CPU nodes interconnected through a network. Our performance evaluation with 18 representative input datasets on two different eight-node systems shows that our tool is significantly faster than the original BiBit. Source code in C++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/parbibit/.
format Online
Article
Text
id pubmed-5880350
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-58803502018-04-13 ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems González-Domínguez, Jorge Expósito, Roberto R. PLoS One Research Article Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, a parallel tool to accelerate the search of interesting biclusters on binary datasets, which are very popular on different fields such as genetics, marketing or text mining. It is based on the state-of-the-art sequential Java tool BiBit, which has been proved accurate by several studies, especially on scenarios that result on many large biclusters. ParBiBit uses the same methodology as BiBit (grouping the binary information into patterns) and provides the same results. Nevertheless, our tool significantly improves performance thanks to an efficient implementation based on C++11 that includes support for threads and MPI processes in order to exploit the compute capabilities of modern distributed-memory systems, which provide several multicore CPU nodes interconnected through a network. Our performance evaluation with 18 representative input datasets on two different eight-node systems shows that our tool is significantly faster than the original BiBit. Source code in C++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/parbibit/. Public Library of Science 2018-04-02 /pmc/articles/PMC5880350/ /pubmed/29608567 http://dx.doi.org/10.1371/journal.pone.0194361 Text en © 2018 González-Domínguez, Expósito http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
González-Domínguez, Jorge
Expósito, Roberto R.
ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems
title ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems
title_full ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems
title_fullStr ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems
title_full_unstemmed ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems
title_short ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems
title_sort parbibit: parallel tool for binary biclustering on modern distributed-memory systems
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5880350/
https://www.ncbi.nlm.nih.gov/pubmed/29608567
http://dx.doi.org/10.1371/journal.pone.0194361
work_keys_str_mv AT gonzalezdominguezjorge parbibitparalleltoolforbinarybiclusteringonmoderndistributedmemorysystems
AT expositorobertor parbibitparalleltoolforbinarybiclusteringonmoderndistributedmemorysystems