Cargando…
ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics
Chemogenomics data generally refers to the activity data of chemical compounds on an array of protein targets and represents an important source of information for building in silico target prediction models. The increasing volume of chemogenomics data offers exciting opportunities to build models b...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5340785/ https://www.ncbi.nlm.nih.gov/pubmed/28316655 http://dx.doi.org/10.1186/s13321-017-0203-5 |
_version_ | 1782512867776397312 |
---|---|
author | Sun, Jiangming Jeliazkova, Nina Chupakhin, Vladimir Golib-Dzib, Jose-Felipe Engkvist, Ola Carlsson, Lars Wegner, Jörg Ceulemans, Hugo Georgiev, Ivan Jeliazkov, Vedrin Kochev, Nikolay Ashby, Thomas J. Chen, Hongming |
author_facet | Sun, Jiangming Jeliazkova, Nina Chupakhin, Vladimir Golib-Dzib, Jose-Felipe Engkvist, Ola Carlsson, Lars Wegner, Jörg Ceulemans, Hugo Georgiev, Ivan Jeliazkov, Vedrin Kochev, Nikolay Ashby, Thomas J. Chen, Hongming |
author_sort | Sun, Jiangming |
collection | PubMed |
description | Chemogenomics data generally refers to the activity data of chemical compounds on an array of protein targets and represents an important source of information for building in silico target prediction models. The increasing volume of chemogenomics data offers exciting opportunities to build models based on Big Data. Preparing a high quality data set is a vital step in realizing this goal and this work aims to compile such a comprehensive chemogenomics dataset. This dataset comprises over 70 million SAR data points from publicly available databases (PubChem and ChEMBL) including structure, target information and activity annotations. Our aspiration is to create a useful chemogenomics resource reflecting industry-scale data not only for building predictive models of in silico polypharmacology and off-target effects but also for the validation of cheminformatics approaches in general. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-017-0203-5) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5340785 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-53407852017-03-17 ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics Sun, Jiangming Jeliazkova, Nina Chupakhin, Vladimir Golib-Dzib, Jose-Felipe Engkvist, Ola Carlsson, Lars Wegner, Jörg Ceulemans, Hugo Georgiev, Ivan Jeliazkov, Vedrin Kochev, Nikolay Ashby, Thomas J. Chen, Hongming J Cheminform Database Chemogenomics data generally refers to the activity data of chemical compounds on an array of protein targets and represents an important source of information for building in silico target prediction models. The increasing volume of chemogenomics data offers exciting opportunities to build models based on Big Data. Preparing a high quality data set is a vital step in realizing this goal and this work aims to compile such a comprehensive chemogenomics dataset. This dataset comprises over 70 million SAR data points from publicly available databases (PubChem and ChEMBL) including structure, target information and activity annotations. Our aspiration is to create a useful chemogenomics resource reflecting industry-scale data not only for building predictive models of in silico polypharmacology and off-target effects but also for the validation of cheminformatics approaches in general. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13321-017-0203-5) contains supplementary material, which is available to authorized users. Springer International Publishing 2017-03-07 /pmc/articles/PMC5340785/ /pubmed/28316655 http://dx.doi.org/10.1186/s13321-017-0203-5 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Database Sun, Jiangming Jeliazkova, Nina Chupakhin, Vladimir Golib-Dzib, Jose-Felipe Engkvist, Ola Carlsson, Lars Wegner, Jörg Ceulemans, Hugo Georgiev, Ivan Jeliazkov, Vedrin Kochev, Nikolay Ashby, Thomas J. Chen, Hongming ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics |
title | ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics |
title_full | ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics |
title_fullStr | ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics |
title_full_unstemmed | ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics |
title_short | ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics |
title_sort | excape-db: an integrated large scale dataset facilitating big data analysis in chemogenomics |
topic | Database |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5340785/ https://www.ncbi.nlm.nih.gov/pubmed/28316655 http://dx.doi.org/10.1186/s13321-017-0203-5 |
work_keys_str_mv | AT sunjiangming excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics AT jeliazkovanina excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics AT chupakhinvladimir excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics AT golibdzibjosefelipe excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics AT engkvistola excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics AT carlssonlars excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics AT wegnerjorg excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics AT ceulemanshugo excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics AT georgievivan excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics AT jeliazkovvedrin excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics AT kochevnikolay excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics AT ashbythomasj excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics AT chenhongming excapedbanintegratedlargescaledatasetfacilitatingbigdataanalysisinchemogenomics |