Cargando…

Large-scale virtual screening on public cloud resources with Apache Spark

BACKGROUND: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the av...

Descripción completa

Detalles Bibliográficos
Autores principales:	Capuccini, Marco, Ahmed, Laeeq, Schaal, Wesley, Laure, Erwin, Spjuth, Ola
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2017
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5339264/ https://www.ncbi.nlm.nih.gov/pubmed/28316653 http://dx.doi.org/10.1186/s13321-017-0204-4

_version_	1782512624718577664
author	Capuccini, Marco Ahmed, Laeeq Schaal, Wesley Laure, Erwin Spjuth, Ola
author_facet	Capuccini, Marco Ahmed, Laeeq Schaal, Wesley Laure, Erwin Spjuth, Ola
author_sort	Capuccini, Marco
collection	PubMed
description	BACKGROUND: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google’s MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. RESULTS: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against [Formula: see text] 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. CONCLUSION: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs). [Figure: see text]
format	Online Article Text
id	pubmed-5339264
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-53392642017-03-17 Large-scale virtual screening on public cloud resources with Apache Spark Capuccini, Marco Ahmed, Laeeq Schaal, Wesley Laure, Erwin Spjuth, Ola J Cheminform Methodology BACKGROUND: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google’s MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. RESULTS: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against [Formula: see text] 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. CONCLUSION: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs). [Figure: see text] Springer International Publishing 2017-03-06 /pmc/articles/PMC5339264/ /pubmed/28316653 http://dx.doi.org/10.1186/s13321-017-0204-4 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Capuccini, Marco Ahmed, Laeeq Schaal, Wesley Laure, Erwin Spjuth, Ola Large-scale virtual screening on public cloud resources with Apache Spark
title	Large-scale virtual screening on public cloud resources with Apache Spark
title_full	Large-scale virtual screening on public cloud resources with Apache Spark
title_fullStr	Large-scale virtual screening on public cloud resources with Apache Spark
title_full_unstemmed	Large-scale virtual screening on public cloud resources with Apache Spark
title_short	Large-scale virtual screening on public cloud resources with Apache Spark
title_sort	large-scale virtual screening on public cloud resources with apache spark
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5339264/ https://www.ncbi.nlm.nih.gov/pubmed/28316653 http://dx.doi.org/10.1186/s13321-017-0204-4
work_keys_str_mv	AT capuccinimarco largescalevirtualscreeningonpubliccloudresourceswithapachespark AT ahmedlaeeq largescalevirtualscreeningonpubliccloudresourceswithapachespark AT schaalwesley largescalevirtualscreeningonpubliccloudresourceswithapachespark AT laureerwin largescalevirtualscreeningonpubliccloudresourceswithapachespark AT spjuthola largescalevirtualscreeningonpubliccloudresourceswithapachespark

Large-scale virtual screening on public cloud resources with Apache Spark

Ejemplares similares