Cargando…

rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics

BACKGROUND: Large-scale datasets of protein structures and sequences are becoming ubiquitous in many domains of biological research. Experimental approaches and computational modelling methods are generating biological data at an unprecedented rate. The detailed analysis of structure-sequence relati...

Descripción completa

Detalles Bibliográficos
Autores principales: Bonet, Jaume, Harteveld, Zander, Sesterhenn, Fabian, Scheck, Andreas, Correia, Bruno E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6521408/
https://www.ncbi.nlm.nih.gov/pubmed/31092198
http://dx.doi.org/10.1186/s12859-019-2796-3
_version_ 1783418951265419264
author Bonet, Jaume
Harteveld, Zander
Sesterhenn, Fabian
Scheck, Andreas
Correia, Bruno E.
author_facet Bonet, Jaume
Harteveld, Zander
Sesterhenn, Fabian
Scheck, Andreas
Correia, Bruno E.
author_sort Bonet, Jaume
collection PubMed
description BACKGROUND: Large-scale datasets of protein structures and sequences are becoming ubiquitous in many domains of biological research. Experimental approaches and computational modelling methods are generating biological data at an unprecedented rate. The detailed analysis of structure-sequence relationships is critical to unveil governing principles of protein folding, stability and function. Computational protein design (CPD) has emerged as an important structure-based approach to engineer proteins for novel functions. Generally, CPD workflows rely on the generation of large numbers of structural models to search for the optimal structure-sequence configurations. As such, an important step of the CPD process is the selection of a small subset of sequences to be experimentally characterized. Given the limitations of current CPD scoring functions, multi-step design protocols and elaborated analysis of the decoy populations have become essential for the selection of sequences for experimental characterization and the success of CPD strategies. RESULTS: Here, we present the rstoolbox, a Python library for the analysis of large-scale structural data tailored for CPD applications. rstoolbox is oriented towards both CPD software users and developers, being easily integrated in analysis workflows. For users, it offers the ability to profile and select decoy sets, which may guide multi-step design protocols or for follow-up experimental characterization. rstoolbox provides intuitive solutions for the visualization of large sequence/structure datasets (e.g. logo plots and heatmaps) and facilitates the analysis of experimental data obtained through traditional biochemical techniques (e.g. circular dichroism and surface plasmon resonance) and high-throughput sequencing. For CPD software developers, it provides a framework to easily benchmark and compare different CPD approaches. Here, we showcase the rstoolbox in both types of applications. CONCLUSIONS: rstoolbox is a library for the evaluation of protein structures datasets tailored for CPD data. It provides interactive access through seamless integration with IPython, while still being suitable for high-performance computing. In addition to its functionalities for data analysis and graphical representation, the inclusion of rstoolbox in protein design pipelines will allow to easily standardize the selection of design candidates, as well as, to improve the overall reproducibility and robustness of CPD selection processes.
format Online
Article
Text
id pubmed-6521408
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65214082019-05-23 rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics Bonet, Jaume Harteveld, Zander Sesterhenn, Fabian Scheck, Andreas Correia, Bruno E. BMC Bioinformatics Software BACKGROUND: Large-scale datasets of protein structures and sequences are becoming ubiquitous in many domains of biological research. Experimental approaches and computational modelling methods are generating biological data at an unprecedented rate. The detailed analysis of structure-sequence relationships is critical to unveil governing principles of protein folding, stability and function. Computational protein design (CPD) has emerged as an important structure-based approach to engineer proteins for novel functions. Generally, CPD workflows rely on the generation of large numbers of structural models to search for the optimal structure-sequence configurations. As such, an important step of the CPD process is the selection of a small subset of sequences to be experimentally characterized. Given the limitations of current CPD scoring functions, multi-step design protocols and elaborated analysis of the decoy populations have become essential for the selection of sequences for experimental characterization and the success of CPD strategies. RESULTS: Here, we present the rstoolbox, a Python library for the analysis of large-scale structural data tailored for CPD applications. rstoolbox is oriented towards both CPD software users and developers, being easily integrated in analysis workflows. For users, it offers the ability to profile and select decoy sets, which may guide multi-step design protocols or for follow-up experimental characterization. rstoolbox provides intuitive solutions for the visualization of large sequence/structure datasets (e.g. logo plots and heatmaps) and facilitates the analysis of experimental data obtained through traditional biochemical techniques (e.g. circular dichroism and surface plasmon resonance) and high-throughput sequencing. For CPD software developers, it provides a framework to easily benchmark and compare different CPD approaches. Here, we showcase the rstoolbox in both types of applications. CONCLUSIONS: rstoolbox is a library for the evaluation of protein structures datasets tailored for CPD data. It provides interactive access through seamless integration with IPython, while still being suitable for high-performance computing. In addition to its functionalities for data analysis and graphical representation, the inclusion of rstoolbox in protein design pipelines will allow to easily standardize the selection of design candidates, as well as, to improve the overall reproducibility and robustness of CPD selection processes. BioMed Central 2019-05-15 /pmc/articles/PMC6521408/ /pubmed/31092198 http://dx.doi.org/10.1186/s12859-019-2796-3 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Bonet, Jaume
Harteveld, Zander
Sesterhenn, Fabian
Scheck, Andreas
Correia, Bruno E.
rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics
title rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics
title_full rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics
title_fullStr rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics
title_full_unstemmed rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics
title_short rstoolbox - a Python library for large-scale analysis of computational protein design data and structural bioinformatics
title_sort rstoolbox - a python library for large-scale analysis of computational protein design data and structural bioinformatics
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6521408/
https://www.ncbi.nlm.nih.gov/pubmed/31092198
http://dx.doi.org/10.1186/s12859-019-2796-3
work_keys_str_mv AT bonetjaume rstoolboxapythonlibraryforlargescaleanalysisofcomputationalproteindesigndataandstructuralbioinformatics
AT harteveldzander rstoolboxapythonlibraryforlargescaleanalysisofcomputationalproteindesigndataandstructuralbioinformatics
AT sesterhennfabian rstoolboxapythonlibraryforlargescaleanalysisofcomputationalproteindesigndataandstructuralbioinformatics
AT scheckandreas rstoolboxapythonlibraryforlargescaleanalysisofcomputationalproteindesigndataandstructuralbioinformatics
AT correiabrunoe rstoolboxapythonlibraryforlargescaleanalysisofcomputationalproteindesigndataandstructuralbioinformatics