Cargando…

CFam: a chemical families database based on iterative selection of functional seeds and seed-directed compound clustering

Similarity-based clustering and classification of compounds enable the search of drug leads and the structural and chemogenomic studies for facilitating chemical, biomedical, agricultural, material and other industrial applications. A database that organizes compounds into similarity-based as well a...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Cheng, Tao, Lin, Qin, Chu, Zhang, Peng, Chen, Shangying, Zeng, Xian, Xu, Feng, Chen, Zhe, Yang, Sheng Yong, Chen, Yu Zong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383987/
https://www.ncbi.nlm.nih.gov/pubmed/25414339
http://dx.doi.org/10.1093/nar/gku1212
_version_ 1782364828135849984
author Zhang, Cheng
Tao, Lin
Qin, Chu
Zhang, Peng
Chen, Shangying
Zeng, Xian
Xu, Feng
Chen, Zhe
Yang, Sheng Yong
Chen, Yu Zong
author_facet Zhang, Cheng
Tao, Lin
Qin, Chu
Zhang, Peng
Chen, Shangying
Zeng, Xian
Xu, Feng
Chen, Zhe
Yang, Sheng Yong
Chen, Yu Zong
author_sort Zhang, Cheng
collection PubMed
description Similarity-based clustering and classification of compounds enable the search of drug leads and the structural and chemogenomic studies for facilitating chemical, biomedical, agricultural, material and other industrial applications. A database that organizes compounds into similarity-based as well as scaffold-based and property-based families is useful for facilitating these tasks. CFam Chemical Family database http://bidd2.cse.nus.edu.sg/cfam was developed to hierarchically cluster drugs, bioactive molecules, human metabolites, natural products, patented agents and other molecules into functional families, superfamilies and classes of structurally similar compounds based on the literature-reported high, intermediate and remote similarity measures. The compounds were represented by molecular fingerprint and molecular similarity was measured by Tanimoto coefficient. The functional seeds of CFam families were from hierarchically clustered drugs, bioactive molecules, human metabolites, natural products, patented agents, respectively, which were used to characterize families and cluster compounds into families, superfamilies and classes. CFam currently contains 11 643 classes, 34 880 superfamilies and 87 136 families of 490 279 compounds (1691 approved drugs, 1228 clinical trial drugs, 12 386 investigative drugs, 262 881 highly active molecules, 15 055 human metabolites, 80 255 ZINC-processed natural products and 116 783 patented agents). Efforts will be made to further expand CFam database and add more functional categories and families based on other types of molecular representations.
format Online
Article
Text
id pubmed-4383987
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-43839872015-04-08 CFam: a chemical families database based on iterative selection of functional seeds and seed-directed compound clustering Zhang, Cheng Tao, Lin Qin, Chu Zhang, Peng Chen, Shangying Zeng, Xian Xu, Feng Chen, Zhe Yang, Sheng Yong Chen, Yu Zong Nucleic Acids Res Database Issue Similarity-based clustering and classification of compounds enable the search of drug leads and the structural and chemogenomic studies for facilitating chemical, biomedical, agricultural, material and other industrial applications. A database that organizes compounds into similarity-based as well as scaffold-based and property-based families is useful for facilitating these tasks. CFam Chemical Family database http://bidd2.cse.nus.edu.sg/cfam was developed to hierarchically cluster drugs, bioactive molecules, human metabolites, natural products, patented agents and other molecules into functional families, superfamilies and classes of structurally similar compounds based on the literature-reported high, intermediate and remote similarity measures. The compounds were represented by molecular fingerprint and molecular similarity was measured by Tanimoto coefficient. The functional seeds of CFam families were from hierarchically clustered drugs, bioactive molecules, human metabolites, natural products, patented agents, respectively, which were used to characterize families and cluster compounds into families, superfamilies and classes. CFam currently contains 11 643 classes, 34 880 superfamilies and 87 136 families of 490 279 compounds (1691 approved drugs, 1228 clinical trial drugs, 12 386 investigative drugs, 262 881 highly active molecules, 15 055 human metabolites, 80 255 ZINC-processed natural products and 116 783 patented agents). Efforts will be made to further expand CFam database and add more functional categories and families based on other types of molecular representations. Oxford University Press 2014-11-20 2015-01-28 /pmc/articles/PMC4383987/ /pubmed/25414339 http://dx.doi.org/10.1093/nar/gku1212 Text en © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Database Issue
Zhang, Cheng
Tao, Lin
Qin, Chu
Zhang, Peng
Chen, Shangying
Zeng, Xian
Xu, Feng
Chen, Zhe
Yang, Sheng Yong
Chen, Yu Zong
CFam: a chemical families database based on iterative selection of functional seeds and seed-directed compound clustering
title CFam: a chemical families database based on iterative selection of functional seeds and seed-directed compound clustering
title_full CFam: a chemical families database based on iterative selection of functional seeds and seed-directed compound clustering
title_fullStr CFam: a chemical families database based on iterative selection of functional seeds and seed-directed compound clustering
title_full_unstemmed CFam: a chemical families database based on iterative selection of functional seeds and seed-directed compound clustering
title_short CFam: a chemical families database based on iterative selection of functional seeds and seed-directed compound clustering
title_sort cfam: a chemical families database based on iterative selection of functional seeds and seed-directed compound clustering
topic Database Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4383987/
https://www.ncbi.nlm.nih.gov/pubmed/25414339
http://dx.doi.org/10.1093/nar/gku1212
work_keys_str_mv AT zhangcheng cfamachemicalfamiliesdatabasebasedoniterativeselectionoffunctionalseedsandseeddirectedcompoundclustering
AT taolin cfamachemicalfamiliesdatabasebasedoniterativeselectionoffunctionalseedsandseeddirectedcompoundclustering
AT qinchu cfamachemicalfamiliesdatabasebasedoniterativeselectionoffunctionalseedsandseeddirectedcompoundclustering
AT zhangpeng cfamachemicalfamiliesdatabasebasedoniterativeselectionoffunctionalseedsandseeddirectedcompoundclustering
AT chenshangying cfamachemicalfamiliesdatabasebasedoniterativeselectionoffunctionalseedsandseeddirectedcompoundclustering
AT zengxian cfamachemicalfamiliesdatabasebasedoniterativeselectionoffunctionalseedsandseeddirectedcompoundclustering
AT xufeng cfamachemicalfamiliesdatabasebasedoniterativeselectionoffunctionalseedsandseeddirectedcompoundclustering
AT chenzhe cfamachemicalfamiliesdatabasebasedoniterativeselectionoffunctionalseedsandseeddirectedcompoundclustering
AT yangshengyong cfamachemicalfamiliesdatabasebasedoniterativeselectionoffunctionalseedsandseeddirectedcompoundclustering
AT chenyuzong cfamachemicalfamiliesdatabasebasedoniterativeselectionoffunctionalseedsandseeddirectedcompoundclustering