Cargando…

ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics

Chemogenomics data generally refers to the activity data of chemical compounds on an array of protein targets and represents an important source of information for building in silico target prediction models. The increasing volume of chemogenomics data offers exciting opportunities to build models b...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Jiangming, Jeliazkova, Nina, Chupakhin, Vladimir, Golib-Dzib, Jose-Felipe, Engkvist, Ola, Carlsson, Lars, Wegner, Jörg, Ceulemans, Hugo, Georgiev, Ivan, Jeliazkov, Vedrin, Kochev, Nikolay, Ashby, Thomas J., Chen, Hongming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5340785/
https://www.ncbi.nlm.nih.gov/pubmed/28316655
http://dx.doi.org/10.1186/s13321-017-0203-5
_version_ 1782512867776397312
author Sun, Jiangming
Jeliazkova, Nina
Chupakhin, Vladimir
Golib-Dzib, Jose-Felipe
Engkvist, Ola
Carlsson, Lars
Wegner, Jörg
Ceulemans, Hugo
Georgiev, Ivan
Jeliazkov, Vedrin
Kochev, Nikolay
Ashby, Thomas J.
Chen, Hongming
author_facet Sun, Jiangming
Jeliazkova, Nina
Chupakhin, Vladimir
Golib-Dzib, Jose-Felipe
Engkvist, Ola
Carlsson, Lars
Wegner, Jörg
Ceulemans, Hugo
Georgiev, Ivan
Jeliazkov, Vedrin
Kochev, Nikolay
Ashby, Thomas J.
Chen, Hongming
author_sort Sun, Jiangming
collection PubMed
description Chemogenomics data generally refers to the activity data of chemical compounds on an array of protein targets and represents an important source of information for building in silico target prediction models. The increasing volume of chemogenomics data offers exciting opportunities to build models based on Big Data. Preparing a high quality data set is a vital step in realizing this goal and this work aims to compile such a comprehensive chemogenomics dataset. This dataset comprises over 70 million SAR data points from publicly available databases (PubChem and ChEMBL) including structure, target information and activity annotations. Our aspiration is to create a useful chemogenomics resource reflecting industry-scale data not only for building predictive models of in silico polypharmacology and off-target effects but also for the validation of cheminformatics approaches in general. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-017-0203-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5340785
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-53407852017-03-17 ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics Sun, Jiangming Jeliazkova, Nina Chupakhin, Vladimir Golib-Dzib, Jose-Felipe Engkvist, Ola Carlsson, Lars Wegner, Jörg Ceulemans, Hugo Georgiev, Ivan Jeliazkov, Vedrin Kochev, Nikolay Ashby, Thomas J. Chen, Hongming J Cheminform Database Chemogenomics data generally refers to the activity data of chemical compounds on an array of protein targets and represents an important source of information for building in silico target prediction models. The increasing volume of chemogenomics data offers exciting opportunities to build models based on Big Data. Preparing a high quality data set is a vital step in realizing this goal and this work aims to compile such a comprehensive chemogenomics dataset. This dataset comprises over 70 million SAR data points from publicly available databases (PubChem and ChEMBL) including structure, target information and activity annotations. Our aspiration is to create a useful chemogenomics resource reflecting industry-scale data not only for building predictive models of in silico polypharmacology and off-target effects but also for the validation of cheminformatics approaches in general. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-017-0203-5) contains supplementary material, which is available to authorized users. Springer International Publishing 2017-03-07 /pmc/articles/PMC5340785/ /pubmed/28316655 http://dx.doi.org/10.1186/s13321-017-0203-5 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Database
Sun, Jiangming
Jeliazkova, Nina
Chupakhin, Vladimir
Golib-Dzib, Jose-Felipe
Engkvist, Ola
Carlsson, Lars
Wegner, Jörg
Ceulemans, Hugo
Georgiev, Ivan
Jeliazkov, Vedrin
Kochev, Nikolay
Ashby, Thomas J.
Chen, Hongming
ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics
title ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics
title_full ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics
title_fullStr ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics
title_full_unstemmed ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics
title_short ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics
title_sort excape-db: an integrated large scale dataset facilitating big data analysis in chemogenomics
topic Database
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5340785/
https://www.ncbi.nlm.nih.gov/pubmed/28316655
http://dx.doi.org/10.1186/s13321-017-0203-5
work_keys_str_mv AT sunjiangming excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics
AT jeliazkovanina excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics
AT chupakhinvladimir excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics
AT golibdzibjosefelipe excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics
AT engkvistola excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics
AT carlssonlars excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics
AT wegnerjorg excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics
AT ceulemanshugo excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics
AT georgievivan excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics
AT jeliazkovvedrin excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics
AT kochevnikolay excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics
AT ashbythomasj excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics
AT chenhongming excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics