Cargando…

Lemon: a framework for rapidly mining structural information from the Protein Data Bank

MOTIVATION: The Protein Data Bank (PDB) currently holds over 140 000 biomolecular structures and continues to release new structures on a weekly basis. The PDB is an essential resource to the structural bioinformatics community to develop software that mine, use, categorize and analyze such data. Ne...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fine, Jonathan, Chopra, Gaurav
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2019
Materias:	Applications Notes
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6792122/ https://www.ncbi.nlm.nih.gov/pubmed/30873531 http://dx.doi.org/10.1093/bioinformatics/btz178

_version_	1783459085727825920
author	Fine, Jonathan Chopra, Gaurav
author_facet	Fine, Jonathan Chopra, Gaurav
author_sort	Fine, Jonathan
collection	PubMed
description	MOTIVATION: The Protein Data Bank (PDB) currently holds over 140 000 biomolecular structures and continues to release new structures on a weekly basis. The PDB is an essential resource to the structural bioinformatics community to develop software that mine, use, categorize and analyze such data. New computational biology methods are evaluated using custom benchmarking sets derived as subsets of 3D experimentally determined structures and structural features from the PDB. Currently, such benchmarking features are manually curated with custom scripts in a non-standardized manner that results in slow distribution and updates with new experimental structures. Finally, there is a scarcity of standardized tools to rapidly query 3D descriptors of the entire PDB. RESULTS: Our solution is the Lemon framework, a C++11 library with Python bindings, which provides a consistent workflow methodology for selecting biomolecular interactions based on user criterion and computing desired 3D structural features. This framework can parse and characterize the entire PDB in <10 min on modern, multithreaded hardware. The speed in parsing is obtained by using the recently developed MacroMolecule Transmission Format to reduce the computational cost of reading text-based PDB files. The use of C++ lambda functions and Python bindings provide extensive flexibility for analysis and categorization of the PDB by allowing the user to write custom functions to suite their objective. We think Lemon will become a one-stop-shop to quickly mine the entire PDB to generate desired structural biology features. AVAILABILITY AND IMPLEMENTATION: The Lemon software is available as a C++ header library along with a PyPI package and example functions at https://github.com/chopralab/lemon. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-6792122
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-67921222019-10-18 Lemon: a framework for rapidly mining structural information from the Protein Data Bank Fine, Jonathan Chopra, Gaurav Bioinformatics Applications Notes MOTIVATION: The Protein Data Bank (PDB) currently holds over 140 000 biomolecular structures and continues to release new structures on a weekly basis. The PDB is an essential resource to the structural bioinformatics community to develop software that mine, use, categorize and analyze such data. New computational biology methods are evaluated using custom benchmarking sets derived as subsets of 3D experimentally determined structures and structural features from the PDB. Currently, such benchmarking features are manually curated with custom scripts in a non-standardized manner that results in slow distribution and updates with new experimental structures. Finally, there is a scarcity of standardized tools to rapidly query 3D descriptors of the entire PDB. RESULTS: Our solution is the Lemon framework, a C++11 library with Python bindings, which provides a consistent workflow methodology for selecting biomolecular interactions based on user criterion and computing desired 3D structural features. This framework can parse and characterize the entire PDB in <10 min on modern, multithreaded hardware. The speed in parsing is obtained by using the recently developed MacroMolecule Transmission Format to reduce the computational cost of reading text-based PDB files. The use of C++ lambda functions and Python bindings provide extensive flexibility for analysis and categorization of the PDB by allowing the user to write custom functions to suite their objective. We think Lemon will become a one-stop-shop to quickly mine the entire PDB to generate desired structural biology features. AVAILABILITY AND IMPLEMENTATION: The Lemon software is available as a C++ header library along with a PyPI package and example functions at https://github.com/chopralab/lemon. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-10-15 2019-03-14 /pmc/articles/PMC6792122/ /pubmed/30873531 http://dx.doi.org/10.1093/bioinformatics/btz178 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Applications Notes Fine, Jonathan Chopra, Gaurav Lemon: a framework for rapidly mining structural information from the Protein Data Bank
title	Lemon: a framework for rapidly mining structural information from the Protein Data Bank
title_full	Lemon: a framework for rapidly mining structural information from the Protein Data Bank
title_fullStr	Lemon: a framework for rapidly mining structural information from the Protein Data Bank
title_full_unstemmed	Lemon: a framework for rapidly mining structural information from the Protein Data Bank
title_short	Lemon: a framework for rapidly mining structural information from the Protein Data Bank
title_sort	lemon: a framework for rapidly mining structural information from the protein data bank
topic	Applications Notes
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6792122/ https://www.ncbi.nlm.nih.gov/pubmed/30873531 http://dx.doi.org/10.1093/bioinformatics/btz178
work_keys_str_mv	AT finejonathan lemonaframeworkforrapidlyminingstructuralinformationfromtheproteindatabank AT chopragaurav lemonaframeworkforrapidlyminingstructuralinformationfromtheproteindatabank

Lemon: a framework for rapidly mining structural information from the Protein Data Bank

Ejemplares similares