Cargando…

PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions

BACKGROUND: With the increasing development of biotechnology and informatics technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these data needs to be extracted and transformed to useful knowledge by various data mining methods....

Descripción completa

Detalles Bibliográficos
Autores principales: Dong, Jie, Yao, Zhi-Jiang, Zhang, Lin, Luo, Feijun, Lin, Qinlu, Lu, Ai-Ping, Chen, Alex F., Cao, Dong-Sheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5861255/
https://www.ncbi.nlm.nih.gov/pubmed/29556758
http://dx.doi.org/10.1186/s13321-018-0270-2
_version_ 1783308061438377984
author Dong, Jie
Yao, Zhi-Jiang
Zhang, Lin
Luo, Feijun
Lin, Qinlu
Lu, Ai-Ping
Chen, Alex F.
Cao, Dong-Sheng
author_facet Dong, Jie
Yao, Zhi-Jiang
Zhang, Lin
Luo, Feijun
Lin, Qinlu
Lu, Ai-Ping
Chen, Alex F.
Cao, Dong-Sheng
author_sort Dong, Jie
collection PubMed
description BACKGROUND: With the increasing development of biotechnology and informatics technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these data needs to be extracted and transformed to useful knowledge by various data mining methods. Considering the amazing rate at which data are accumulated in chemistry and biology fields, new tools that process and interpret large and complex interaction data are increasingly important. So far, there are no suitable toolkits that can effectively link the chemical and biological space in view of molecular representation. To further explore these complex data, an integrated toolkit for various molecular representation is urgently needed which could be easily integrated with data mining algorithms to start a full data analysis pipeline. RESULTS: Herein, the python library PyBioMed is presented, which comprises functionalities for online download for various molecular objects by providing different IDs, the pretreatment of molecular structures, the computation of various molecular descriptors for chemicals, proteins, DNAs and their interactions. PyBioMed is a feature-rich and highly customized python library used for the characterization of various complex chemical and biological molecules and interaction samples. The current version of PyBioMed could calculate 775 chemical descriptors and 19 kinds of chemical fingerprints, 9920 protein descriptors based on protein sequences, more than 6000 DNA descriptors from nucleotide sequences, and interaction descriptors from pairwise samples using three different combining strategies. Several examples and five real-life applications were provided to clearly guide the users how to use PyBioMed as an integral part of data analysis projects. By using PyBioMed, users are able to start a full pipelining from getting molecular data, pretreating molecules, molecular representation to constructing machine learning models conveniently. CONCLUSION: PyBioMed provides various user-friendly and highly customized APIs to calculate various features of biological molecules and complex interaction samples conveniently, which aims at building integrated analysis pipelines from data acquisition, data checking, and descriptor calculation to modeling. PyBioMed is freely available at http://projects.scbdd.com/pybiomed.html. [Image: see text]
format Online
Article
Text
id pubmed-5861255
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-58612552018-03-23 PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions Dong, Jie Yao, Zhi-Jiang Zhang, Lin Luo, Feijun Lin, Qinlu Lu, Ai-Ping Chen, Alex F. Cao, Dong-Sheng J Cheminform Software BACKGROUND: With the increasing development of biotechnology and informatics technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these data needs to be extracted and transformed to useful knowledge by various data mining methods. Considering the amazing rate at which data are accumulated in chemistry and biology fields, new tools that process and interpret large and complex interaction data are increasingly important. So far, there are no suitable toolkits that can effectively link the chemical and biological space in view of molecular representation. To further explore these complex data, an integrated toolkit for various molecular representation is urgently needed which could be easily integrated with data mining algorithms to start a full data analysis pipeline. RESULTS: Herein, the python library PyBioMed is presented, which comprises functionalities for online download for various molecular objects by providing different IDs, the pretreatment of molecular structures, the computation of various molecular descriptors for chemicals, proteins, DNAs and their interactions. PyBioMed is a feature-rich and highly customized python library used for the characterization of various complex chemical and biological molecules and interaction samples. The current version of PyBioMed could calculate 775 chemical descriptors and 19 kinds of chemical fingerprints, 9920 protein descriptors based on protein sequences, more than 6000 DNA descriptors from nucleotide sequences, and interaction descriptors from pairwise samples using three different combining strategies. Several examples and five real-life applications were provided to clearly guide the users how to use PyBioMed as an integral part of data analysis projects. By using PyBioMed, users are able to start a full pipelining from getting molecular data, pretreating molecules, molecular representation to constructing machine learning models conveniently. CONCLUSION: PyBioMed provides various user-friendly and highly customized APIs to calculate various features of biological molecules and complex interaction samples conveniently, which aims at building integrated analysis pipelines from data acquisition, data checking, and descriptor calculation to modeling. PyBioMed is freely available at http://projects.scbdd.com/pybiomed.html. [Image: see text] Springer International Publishing 2018-03-20 /pmc/articles/PMC5861255/ /pubmed/29556758 http://dx.doi.org/10.1186/s13321-018-0270-2 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Dong, Jie
Yao, Zhi-Jiang
Zhang, Lin
Luo, Feijun
Lin, Qinlu
Lu, Ai-Ping
Chen, Alex F.
Cao, Dong-Sheng
PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions
title PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions
title_full PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions
title_fullStr PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions
title_full_unstemmed PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions
title_short PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions
title_sort pybiomed: a python library for various molecular representations of chemicals, proteins and dnas and their interactions
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5861255/
https://www.ncbi.nlm.nih.gov/pubmed/29556758
http://dx.doi.org/10.1186/s13321-018-0270-2
work_keys_str_mv AT dongjie pybiomedapythonlibraryforvariousmolecularrepresentationsofchemicalsproteinsanddnasandtheirinteractions
AT yaozhijiang pybiomedapythonlibraryforvariousmolecularrepresentationsofchemicalsproteinsanddnasandtheirinteractions
AT zhanglin pybiomedapythonlibraryforvariousmolecularrepresentationsofchemicalsproteinsanddnasandtheirinteractions
AT luofeijun pybiomedapythonlibraryforvariousmolecularrepresentationsofchemicalsproteinsanddnasandtheirinteractions
AT linqinlu pybiomedapythonlibraryforvariousmolecularrepresentationsofchemicalsproteinsanddnasandtheirinteractions
AT luaiping pybiomedapythonlibraryforvariousmolecularrepresentationsofchemicalsproteinsanddnasandtheirinteractions
AT chenalexf pybiomedapythonlibraryforvariousmolecularrepresentationsofchemicalsproteinsanddnasandtheirinteractions
AT caodongsheng pybiomedapythonlibraryforvariousmolecularrepresentationsofchemicalsproteinsanddnasandtheirinteractions