Cargando…

cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining

BACKGROUND: Frequent subgraphs mining is a significant problem in many practical domains. The solution of this kind of problem can particularly used in some large-scale drug molecular or biological libraries to help us find drugs or core biological structures rapidly and predict toxicity of some unk...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Shunyun, Guo, Runxin, Liu, Rui, Liao, Xiangke, Zou, Quan, Shi, Benyun, Peng, Shaoliang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998871/
https://www.ncbi.nlm.nih.gov/pubmed/29745832
http://dx.doi.org/10.1186/s12859-018-2071-z
_version_ 1783331318509076480
author Yang, Shunyun
Guo, Runxin
Liu, Rui
Liao, Xiangke
Zou, Quan
Shi, Benyun
Peng, Shaoliang
author_facet Yang, Shunyun
Guo, Runxin
Liu, Rui
Liao, Xiangke
Zou, Quan
Shi, Benyun
Peng, Shaoliang
author_sort Yang, Shunyun
collection PubMed
description BACKGROUND: Frequent subgraphs mining is a significant problem in many practical domains. The solution of this kind of problem can particularly used in some large-scale drug molecular or biological libraries to help us find drugs or core biological structures rapidly and predict toxicity of some unknown compounds. The main challenge is its efficiency, as (i) it is computationally intensive to test for graph isomorphisms, and (ii) the graph collection to be mined and mining results can be very large. Existing solutions often require days to derive mining results from biological networks even with relative low support threshold. Also, the whole mining results always cannot be stored in single node memory. RESULTS: In this paper, we implement a parallel acceleration tool for classical frequent subgraph mining algorithm called cmFSM. The core idea is to employ parallel techniques to parallelize extension tasks, so as to reduce computation time. On the other hand, we employ multi-node strategy to solve the problem of memory constraints. The parallel optimization of cmFSM is carried out on three different levels, including the fine-grained OpenMP parallelization on single node, multi-node multi-process parallel acceleration and CPU-MIC collaborated parallel optimization. CONCLUSIONS: Evaluation results show that cmFSM clearly outperforms the existing state-of-the-art miners even if we only hold a few parallel computing resources. It means that cmFSM provides a practical solution to frequent subgraph mining problem with huge number of mining results. Specifically, our solution is up to one order of magnitude faster than the best CPU-based approach on single node and presents a promising scalability of massive mining tasks in multi-node scenario. More source code are available at:Source Code: https://github.com/ysycloud/cmFSM.
format Online
Article
Text
id pubmed-5998871
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59988712018-06-25 cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining Yang, Shunyun Guo, Runxin Liu, Rui Liao, Xiangke Zou, Quan Shi, Benyun Peng, Shaoliang BMC Bioinformatics Research BACKGROUND: Frequent subgraphs mining is a significant problem in many practical domains. The solution of this kind of problem can particularly used in some large-scale drug molecular or biological libraries to help us find drugs or core biological structures rapidly and predict toxicity of some unknown compounds. The main challenge is its efficiency, as (i) it is computationally intensive to test for graph isomorphisms, and (ii) the graph collection to be mined and mining results can be very large. Existing solutions often require days to derive mining results from biological networks even with relative low support threshold. Also, the whole mining results always cannot be stored in single node memory. RESULTS: In this paper, we implement a parallel acceleration tool for classical frequent subgraph mining algorithm called cmFSM. The core idea is to employ parallel techniques to parallelize extension tasks, so as to reduce computation time. On the other hand, we employ multi-node strategy to solve the problem of memory constraints. The parallel optimization of cmFSM is carried out on three different levels, including the fine-grained OpenMP parallelization on single node, multi-node multi-process parallel acceleration and CPU-MIC collaborated parallel optimization. CONCLUSIONS: Evaluation results show that cmFSM clearly outperforms the existing state-of-the-art miners even if we only hold a few parallel computing resources. It means that cmFSM provides a practical solution to frequent subgraph mining problem with huge number of mining results. Specifically, our solution is up to one order of magnitude faster than the best CPU-based approach on single node and presents a promising scalability of massive mining tasks in multi-node scenario. More source code are available at:Source Code: https://github.com/ysycloud/cmFSM. BioMed Central 2018-05-08 /pmc/articles/PMC5998871/ /pubmed/29745832 http://dx.doi.org/10.1186/s12859-018-2071-z Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Yang, Shunyun
Guo, Runxin
Liu, Rui
Liao, Xiangke
Zou, Quan
Shi, Benyun
Peng, Shaoliang
cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining
title cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining
title_full cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining
title_fullStr cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining
title_full_unstemmed cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining
title_short cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining
title_sort cmfsm: a scalable cpu-mic coordinated drug-finding tool by frequent subgraph mining
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998871/
https://www.ncbi.nlm.nih.gov/pubmed/29745832
http://dx.doi.org/10.1186/s12859-018-2071-z
work_keys_str_mv AT yangshunyun cmfsmascalablecpumiccoordinateddrugfindingtoolbyfrequentsubgraphmining
AT guorunxin cmfsmascalablecpumiccoordinateddrugfindingtoolbyfrequentsubgraphmining
AT liurui cmfsmascalablecpumiccoordinateddrugfindingtoolbyfrequentsubgraphmining
AT liaoxiangke cmfsmascalablecpumiccoordinateddrugfindingtoolbyfrequentsubgraphmining
AT zouquan cmfsmascalablecpumiccoordinateddrugfindingtoolbyfrequentsubgraphmining
AT shibenyun cmfsmascalablecpumiccoordinateddrugfindingtoolbyfrequentsubgraphmining
AT pengshaoliang cmfsmascalablecpumiccoordinateddrugfindingtoolbyfrequentsubgraphmining