Cargando…
Large-scale virtual screening on public cloud resources with Apache Spark
BACKGROUND: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the av...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5339264/ https://www.ncbi.nlm.nih.gov/pubmed/28316653 http://dx.doi.org/10.1186/s13321-017-0204-4 |
_version_ | 1782512624718577664 |
---|---|
author | Capuccini, Marco Ahmed, Laeeq Schaal, Wesley Laure, Erwin Spjuth, Ola |
author_facet | Capuccini, Marco Ahmed, Laeeq Schaal, Wesley Laure, Erwin Spjuth, Ola |
author_sort | Capuccini, Marco |
collection | PubMed |
description | BACKGROUND: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google’s MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. RESULTS: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against [Formula: see text] 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. CONCLUSION: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs). [Figure: see text] |
format | Online Article Text |
id | pubmed-5339264 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-53392642017-03-17 Large-scale virtual screening on public cloud resources with Apache Spark Capuccini, Marco Ahmed, Laeeq Schaal, Wesley Laure, Erwin Spjuth, Ola J Cheminform Methodology BACKGROUND: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google’s MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. RESULTS: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against [Formula: see text] 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. CONCLUSION: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs). [Figure: see text] Springer International Publishing 2017-03-06 /pmc/articles/PMC5339264/ /pubmed/28316653 http://dx.doi.org/10.1186/s13321-017-0204-4 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Capuccini, Marco Ahmed, Laeeq Schaal, Wesley Laure, Erwin Spjuth, Ola Large-scale virtual screening on public cloud resources with Apache Spark |
title | Large-scale virtual screening on public cloud resources with Apache Spark |
title_full | Large-scale virtual screening on public cloud resources with Apache Spark |
title_fullStr | Large-scale virtual screening on public cloud resources with Apache Spark |
title_full_unstemmed | Large-scale virtual screening on public cloud resources with Apache Spark |
title_short | Large-scale virtual screening on public cloud resources with Apache Spark |
title_sort | large-scale virtual screening on public cloud resources with apache spark |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5339264/ https://www.ncbi.nlm.nih.gov/pubmed/28316653 http://dx.doi.org/10.1186/s13321-017-0204-4 |
work_keys_str_mv | AT capuccinimarco largescalevirtualscreeningonpubliccloudresourceswithapachespark AT ahmedlaeeq largescalevirtualscreeningonpubliccloudresourceswithapachespark AT schaalwesley largescalevirtualscreeningonpubliccloudresourceswithapachespark AT laureerwin largescalevirtualscreeningonpubliccloudresourceswithapachespark AT spjuthola largescalevirtualscreeningonpubliccloudresourceswithapachespark |