Cargando…

KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics

BACKGROUND: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors,...

Descripción completa

Detalles Bibliográficos
Autores principales: Kotera, Masaaki, Tabei, Yasuo, Yamanishi, Yoshihiro, Moriya, Yuki, Tokimatsu, Toshiaki, Kanehisa, Minoru, Goto, Susumu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029371/
https://www.ncbi.nlm.nih.gov/pubmed/24564846
http://dx.doi.org/10.1186/1752-0509-7-S6-S2
_version_ 1782317197580828672
author Kotera, Masaaki
Tabei, Yasuo
Yamanishi, Yoshihiro
Moriya, Yuki
Tokimatsu, Toshiaki
Kanehisa, Minoru
Goto, Susumu
author_facet Kotera, Masaaki
Tabei, Yasuo
Yamanishi, Yoshihiro
Moriya, Yuki
Tokimatsu, Toshiaki
Kanehisa, Minoru
Goto, Susumu
author_sort Kotera, Masaaki
collection PubMed
description BACKGROUND: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors, i.e., vectors that summarize information on its various properties. However, it is difficult to interpret these chemical descriptors since they are not directly linked to the terminology of functional groups or substructures that the biochemists use. METHODS: In this study, we used KEGG Chemical Function (KCF) format to computationally describe biochemical substructures in seven attributes that resemble biochemists' way of dealing with substructures. RESULTS: We established KCF-S (KCF-and-Substructures) format as an additional structural information of KCF. Applying KCF-S revealed the specific appearance of substructures from various datasets of molecules that describes the characteristics of the respective datasets. Structure-based clustering of molecules using KCF-S resulted the clusters in which molecular weights and structures were less diverse than those obtained by conventional chemical fingerprints. We further applied KCF-S to find the pairs of molecules that are possibly converted to each other in enzymatic reactions, and KCF-S clearly improved predictive performance than that presented previously. CONCLUSIONS: KCF-S defines biochemical substructures with keeping interpretability, suggesting the potential to apply more studies on chemical bioinformatics. KCF and KCF-S can be automatically converted from Molfile format, enabling to deal with molecules from any data sources.
format Online
Article
Text
id pubmed-4029371
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40293712014-06-06 KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics Kotera, Masaaki Tabei, Yasuo Yamanishi, Yoshihiro Moriya, Yuki Tokimatsu, Toshiaki Kanehisa, Minoru Goto, Susumu BMC Syst Biol Research BACKGROUND: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors, i.e., vectors that summarize information on its various properties. However, it is difficult to interpret these chemical descriptors since they are not directly linked to the terminology of functional groups or substructures that the biochemists use. METHODS: In this study, we used KEGG Chemical Function (KCF) format to computationally describe biochemical substructures in seven attributes that resemble biochemists' way of dealing with substructures. RESULTS: We established KCF-S (KCF-and-Substructures) format as an additional structural information of KCF. Applying KCF-S revealed the specific appearance of substructures from various datasets of molecules that describes the characteristics of the respective datasets. Structure-based clustering of molecules using KCF-S resulted the clusters in which molecular weights and structures were less diverse than those obtained by conventional chemical fingerprints. We further applied KCF-S to find the pairs of molecules that are possibly converted to each other in enzymatic reactions, and KCF-S clearly improved predictive performance than that presented previously. CONCLUSIONS: KCF-S defines biochemical substructures with keeping interpretability, suggesting the potential to apply more studies on chemical bioinformatics. KCF and KCF-S can be automatically converted from Molfile format, enabling to deal with molecules from any data sources. BioMed Central 2013-12-13 /pmc/articles/PMC4029371/ /pubmed/24564846 http://dx.doi.org/10.1186/1752-0509-7-S6-S2 Text en Copyright © 2013 Kotera et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Kotera, Masaaki
Tabei, Yasuo
Yamanishi, Yoshihiro
Moriya, Yuki
Tokimatsu, Toshiaki
Kanehisa, Minoru
Goto, Susumu
KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics
title KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics
title_full KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics
title_fullStr KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics
title_full_unstemmed KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics
title_short KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics
title_sort kcf-s: kegg chemical function and substructure for improved interpretability and prediction in chemical bioinformatics
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029371/
https://www.ncbi.nlm.nih.gov/pubmed/24564846
http://dx.doi.org/10.1186/1752-0509-7-S6-S2
work_keys_str_mv AT koteramasaaki kcfskeggchemicalfunctionandsubstructureforimprovedinterpretabilityandpredictioninchemicalbioinformatics
AT tabeiyasuo kcfskeggchemicalfunctionandsubstructureforimprovedinterpretabilityandpredictioninchemicalbioinformatics
AT yamanishiyoshihiro kcfskeggchemicalfunctionandsubstructureforimprovedinterpretabilityandpredictioninchemicalbioinformatics
AT moriyayuki kcfskeggchemicalfunctionandsubstructureforimprovedinterpretabilityandpredictioninchemicalbioinformatics
AT tokimatsutoshiaki kcfskeggchemicalfunctionandsubstructureforimprovedinterpretabilityandpredictioninchemicalbioinformatics
AT kanehisaminoru kcfskeggchemicalfunctionandsubstructureforimprovedinterpretabilityandpredictioninchemicalbioinformatics
AT gotosusumu kcfskeggchemicalfunctionandsubstructureforimprovedinterpretabilityandpredictioninchemicalbioinformatics