Cargando…

Ensembler: Enabling High-Throughput Molecular Simulations at the Superfamily Scale

The rapidly expanding body of available genomic and protein structural data provides a rich resource for understanding protein dynamics with biomolecular simulation. While computational infrastructure has grown rapidly, simulations on an omics scale are not yet widespread, primarily because software...

Descripción completa

Detalles Bibliográficos
Autores principales: Parton, Daniel L., Grinaway, Patrick B., Hanson, Sonya M., Beauchamp, Kyle A., Chodera, John D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4918922/
https://www.ncbi.nlm.nih.gov/pubmed/27337644
http://dx.doi.org/10.1371/journal.pcbi.1004728
_version_ 1782439176907522048
author Parton, Daniel L.
Grinaway, Patrick B.
Hanson, Sonya M.
Beauchamp, Kyle A.
Chodera, John D.
author_facet Parton, Daniel L.
Grinaway, Patrick B.
Hanson, Sonya M.
Beauchamp, Kyle A.
Chodera, John D.
author_sort Parton, Daniel L.
collection PubMed
description The rapidly expanding body of available genomic and protein structural data provides a rich resource for understanding protein dynamics with biomolecular simulation. While computational infrastructure has grown rapidly, simulations on an omics scale are not yet widespread, primarily because software infrastructure to enable simulations at this scale has not kept pace. It should now be possible to study protein dynamics across entire (super)families, exploiting both available structural biology data and conformational similarities across homologous proteins. Here, we present a new tool for enabling high-throughput simulation in the genomics era. Ensembler takes any set of sequences—from a single sequence to an entire superfamily—and shepherds them through various stages of modeling and refinement to produce simulation-ready structures. This includes comparative modeling to all relevant PDB structures (which may span multiple conformational states of interest), reconstruction of missing loops, addition of missing atoms, culling of nearly identical structures, assignment of appropriate protonation states, solvation in explicit solvent, and refinement and filtering with molecular simulation to ensure stable simulation. The output of this pipeline is an ensemble of structures ready for subsequent molecular simulations using computer clusters, supercomputers, or distributed computing projects like Folding@home. Ensembler thus automates much of the time-consuming process of preparing protein models suitable for simulation, while allowing scalability up to entire superfamilies. A particular advantage of this approach can be found in the construction of kinetic models of conformational dynamics—such as Markov state models (MSMs)—which benefit from a diverse array of initial configurations that span the accessible conformational states to aid sampling. We demonstrate the power of this approach by constructing models for all catalytic domains in the human tyrosine kinase family, using all available kinase catalytic domain structures from any organism as structural templates. Ensembler is free and open source software licensed under the GNU General Public License (GPL) v2. It is compatible with Linux and OS X. The latest release can be installed via the conda package manager, and the latest source can be downloaded from https://github.com/choderalab/ensembler.
format Online
Article
Text
id pubmed-4918922
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-49189222016-07-08 Ensembler: Enabling High-Throughput Molecular Simulations at the Superfamily Scale Parton, Daniel L. Grinaway, Patrick B. Hanson, Sonya M. Beauchamp, Kyle A. Chodera, John D. PLoS Comput Biol Research Article The rapidly expanding body of available genomic and protein structural data provides a rich resource for understanding protein dynamics with biomolecular simulation. While computational infrastructure has grown rapidly, simulations on an omics scale are not yet widespread, primarily because software infrastructure to enable simulations at this scale has not kept pace. It should now be possible to study protein dynamics across entire (super)families, exploiting both available structural biology data and conformational similarities across homologous proteins. Here, we present a new tool for enabling high-throughput simulation in the genomics era. Ensembler takes any set of sequences—from a single sequence to an entire superfamily—and shepherds them through various stages of modeling and refinement to produce simulation-ready structures. This includes comparative modeling to all relevant PDB structures (which may span multiple conformational states of interest), reconstruction of missing loops, addition of missing atoms, culling of nearly identical structures, assignment of appropriate protonation states, solvation in explicit solvent, and refinement and filtering with molecular simulation to ensure stable simulation. The output of this pipeline is an ensemble of structures ready for subsequent molecular simulations using computer clusters, supercomputers, or distributed computing projects like Folding@home. Ensembler thus automates much of the time-consuming process of preparing protein models suitable for simulation, while allowing scalability up to entire superfamilies. A particular advantage of this approach can be found in the construction of kinetic models of conformational dynamics—such as Markov state models (MSMs)—which benefit from a diverse array of initial configurations that span the accessible conformational states to aid sampling. We demonstrate the power of this approach by constructing models for all catalytic domains in the human tyrosine kinase family, using all available kinase catalytic domain structures from any organism as structural templates. Ensembler is free and open source software licensed under the GNU General Public License (GPL) v2. It is compatible with Linux and OS X. The latest release can be installed via the conda package manager, and the latest source can be downloaded from https://github.com/choderalab/ensembler. Public Library of Science 2016-06-23 /pmc/articles/PMC4918922/ /pubmed/27337644 http://dx.doi.org/10.1371/journal.pcbi.1004728 Text en © 2016 Parton et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Parton, Daniel L.
Grinaway, Patrick B.
Hanson, Sonya M.
Beauchamp, Kyle A.
Chodera, John D.
Ensembler: Enabling High-Throughput Molecular Simulations at the Superfamily Scale
title Ensembler: Enabling High-Throughput Molecular Simulations at the Superfamily Scale
title_full Ensembler: Enabling High-Throughput Molecular Simulations at the Superfamily Scale
title_fullStr Ensembler: Enabling High-Throughput Molecular Simulations at the Superfamily Scale
title_full_unstemmed Ensembler: Enabling High-Throughput Molecular Simulations at the Superfamily Scale
title_short Ensembler: Enabling High-Throughput Molecular Simulations at the Superfamily Scale
title_sort ensembler: enabling high-throughput molecular simulations at the superfamily scale
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4918922/
https://www.ncbi.nlm.nih.gov/pubmed/27337644
http://dx.doi.org/10.1371/journal.pcbi.1004728
work_keys_str_mv AT partondaniell ensemblerenablinghighthroughputmolecularsimulationsatthesuperfamilyscale
AT grinawaypatrickb ensemblerenablinghighthroughputmolecularsimulationsatthesuperfamilyscale
AT hansonsonyam ensemblerenablinghighthroughputmolecularsimulationsatthesuperfamilyscale
AT beauchampkylea ensemblerenablinghighthroughputmolecularsimulationsatthesuperfamilyscale
AT choderajohnd ensemblerenablinghighthroughputmolecularsimulationsatthesuperfamilyscale