Cargando…
KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics
BACKGROUND: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors,...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029371/ https://www.ncbi.nlm.nih.gov/pubmed/24564846 http://dx.doi.org/10.1186/1752-0509-7-S6-S2 |
_version_ | 1782317197580828672 |
---|---|
author | Kotera, Masaaki Tabei, Yasuo Yamanishi, Yoshihiro Moriya, Yuki Tokimatsu, Toshiaki Kanehisa, Minoru Goto, Susumu |
author_facet | Kotera, Masaaki Tabei, Yasuo Yamanishi, Yoshihiro Moriya, Yuki Tokimatsu, Toshiaki Kanehisa, Minoru Goto, Susumu |
author_sort | Kotera, Masaaki |
collection | PubMed |
description | BACKGROUND: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors, i.e., vectors that summarize information on its various properties. However, it is difficult to interpret these chemical descriptors since they are not directly linked to the terminology of functional groups or substructures that the biochemists use. METHODS: In this study, we used KEGG Chemical Function (KCF) format to computationally describe biochemical substructures in seven attributes that resemble biochemists' way of dealing with substructures. RESULTS: We established KCF-S (KCF-and-Substructures) format as an additional structural information of KCF. Applying KCF-S revealed the specific appearance of substructures from various datasets of molecules that describes the characteristics of the respective datasets. Structure-based clustering of molecules using KCF-S resulted the clusters in which molecular weights and structures were less diverse than those obtained by conventional chemical fingerprints. We further applied KCF-S to find the pairs of molecules that are possibly converted to each other in enzymatic reactions, and KCF-S clearly improved predictive performance than that presented previously. CONCLUSIONS: KCF-S defines biochemical substructures with keeping interpretability, suggesting the potential to apply more studies on chemical bioinformatics. KCF and KCF-S can be automatically converted from Molfile format, enabling to deal with molecules from any data sources. |
format | Online Article Text |
id | pubmed-4029371 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40293712014-06-06 KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics Kotera, Masaaki Tabei, Yasuo Yamanishi, Yoshihiro Moriya, Yuki Tokimatsu, Toshiaki Kanehisa, Minoru Goto, Susumu BMC Syst Biol Research BACKGROUND: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors, i.e., vectors that summarize information on its various properties. However, it is difficult to interpret these chemical descriptors since they are not directly linked to the terminology of functional groups or substructures that the biochemists use. METHODS: In this study, we used KEGG Chemical Function (KCF) format to computationally describe biochemical substructures in seven attributes that resemble biochemists' way of dealing with substructures. RESULTS: We established KCF-S (KCF-and-Substructures) format as an additional structural information of KCF. Applying KCF-S revealed the specific appearance of substructures from various datasets of molecules that describes the characteristics of the respective datasets. Structure-based clustering of molecules using KCF-S resulted the clusters in which molecular weights and structures were less diverse than those obtained by conventional chemical fingerprints. We further applied KCF-S to find the pairs of molecules that are possibly converted to each other in enzymatic reactions, and KCF-S clearly improved predictive performance than that presented previously. CONCLUSIONS: KCF-S defines biochemical substructures with keeping interpretability, suggesting the potential to apply more studies on chemical bioinformatics. KCF and KCF-S can be automatically converted from Molfile format, enabling to deal with molecules from any data sources. BioMed Central 2013-12-13 /pmc/articles/PMC4029371/ /pubmed/24564846 http://dx.doi.org/10.1186/1752-0509-7-S6-S2 Text en Copyright © 2013 Kotera et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Kotera, Masaaki Tabei, Yasuo Yamanishi, Yoshihiro Moriya, Yuki Tokimatsu, Toshiaki Kanehisa, Minoru Goto, Susumu KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics |
title | KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics |
title_full | KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics |
title_fullStr | KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics |
title_full_unstemmed | KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics |
title_short | KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics |
title_sort | kcf-s: kegg chemical function and substructure for improved interpretability and prediction in chemical bioinformatics |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029371/ https://www.ncbi.nlm.nih.gov/pubmed/24564846 http://dx.doi.org/10.1186/1752-0509-7-S6-S2 |
work_keys_str_mv | AT koteramasaaki kcfskeggchemicalfunctionandsubstructureforimprovedinterpretabilityandpredictioninchemicalbioinformatics AT tabeiyasuo kcfskeggchemicalfunctionandsubstructureforimprovedinterpretabilityandpredictioninchemicalbioinformatics AT yamanishiyoshihiro kcfskeggchemicalfunctionandsubstructureforimprovedinterpretabilityandpredictioninchemicalbioinformatics AT moriyayuki kcfskeggchemicalfunctionandsubstructureforimprovedinterpretabilityandpredictioninchemicalbioinformatics AT tokimatsutoshiaki kcfskeggchemicalfunctionandsubstructureforimprovedinterpretabilityandpredictioninchemicalbioinformatics AT kanehisaminoru kcfskeggchemicalfunctionandsubstructureforimprovedinterpretabilityandpredictioninchemicalbioinformatics AT gotosusumu kcfskeggchemicalfunctionandsubstructureforimprovedinterpretabilityandpredictioninchemicalbioinformatics |