Cargando…

Extracting glycan motifs using a biochemicallyweighted kernel

Carbohydrates, or glycans, are one of the most abundant and structurally diverse biopolymers constitute the third major class of biomolecules, following DNA and proteins. However, the study of carbohydrate sugar chains has lagged behind compared to that of DNA and proteins, mainly due to their inher...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Hao, Aoki-Kinoshita, Kiyoko F, Ching, Wai-Ki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Biomedical Informatics 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3280441/
https://www.ncbi.nlm.nih.gov/pubmed/22347783
_version_ 1782223827819823104
author Jiang, Hao
Aoki-Kinoshita, Kiyoko F
Ching, Wai-Ki
author_facet Jiang, Hao
Aoki-Kinoshita, Kiyoko F
Ching, Wai-Ki
author_sort Jiang, Hao
collection PubMed
description Carbohydrates, or glycans, are one of the most abundant and structurally diverse biopolymers constitute the third major class of biomolecules, following DNA and proteins. However, the study of carbohydrate sugar chains has lagged behind compared to that of DNA and proteins, mainly due to their inherent structural complexity. However, their analysis is important because they serve various important roles in biological processes, including signaling transduction and cellular recognition. In order to glean some light into glycan function based on carbohydrate structure, kernel methods have been developed in the past, in particular to extract potential glycan biomarkers by classifying glycan structures found in different tissue samples. The recently developed weighted qgram method (LK-method) exhibits good performance on glycan structure classification while having limitations in feature selection. That is, it was unable to extract biologically meaningful features from the data. Therefore, we propose a biochemicallyweighted tree kernel (BioLK-method) which is based on a glycan similarity matrix and also incorporates biochemical information of individual q-grams in constructing the kernel matrix. We further applied our new method for the classification and recognition of motifs on publicly available glycan data. Our novel tree kernel (BioLK-method) using a Support Vector Machine (SVM) is capable of detecting biologically important motifs accurately while LK-method failed to do so. It was tested on three glycan data sets from the Consortium for Functional Glycomics (CFG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) GLYCAN and showed that the results are consistent with the literature. The newly developed BioLK-method also maintains comparable classification performance with the LK-method. Our results obtained here indicate that the incorporation of biochemical information of q-grams further shows the flexibility and capability of the novel kernel in feature extraction, which may aid in the prediction of glycan biomarkers.
format Online
Article
Text
id pubmed-3280441
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Biomedical Informatics
record_format MEDLINE/PubMed
spelling pubmed-32804412012-02-17 Extracting glycan motifs using a biochemicallyweighted kernel Jiang, Hao Aoki-Kinoshita, Kiyoko F Ching, Wai-Ki Bioinformation Hypothesis Carbohydrates, or glycans, are one of the most abundant and structurally diverse biopolymers constitute the third major class of biomolecules, following DNA and proteins. However, the study of carbohydrate sugar chains has lagged behind compared to that of DNA and proteins, mainly due to their inherent structural complexity. However, their analysis is important because they serve various important roles in biological processes, including signaling transduction and cellular recognition. In order to glean some light into glycan function based on carbohydrate structure, kernel methods have been developed in the past, in particular to extract potential glycan biomarkers by classifying glycan structures found in different tissue samples. The recently developed weighted qgram method (LK-method) exhibits good performance on glycan structure classification while having limitations in feature selection. That is, it was unable to extract biologically meaningful features from the data. Therefore, we propose a biochemicallyweighted tree kernel (BioLK-method) which is based on a glycan similarity matrix and also incorporates biochemical information of individual q-grams in constructing the kernel matrix. We further applied our new method for the classification and recognition of motifs on publicly available glycan data. Our novel tree kernel (BioLK-method) using a Support Vector Machine (SVM) is capable of detecting biologically important motifs accurately while LK-method failed to do so. It was tested on three glycan data sets from the Consortium for Functional Glycomics (CFG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) GLYCAN and showed that the results are consistent with the literature. The newly developed BioLK-method also maintains comparable classification performance with the LK-method. Our results obtained here indicate that the incorporation of biochemical information of q-grams further shows the flexibility and capability of the novel kernel in feature extraction, which may aid in the prediction of glycan biomarkers. Biomedical Informatics 2011-12-21 /pmc/articles/PMC3280441/ /pubmed/22347783 Text en © 2011 Biomedical Informatics This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.
spellingShingle Hypothesis
Jiang, Hao
Aoki-Kinoshita, Kiyoko F
Ching, Wai-Ki
Extracting glycan motifs using a biochemicallyweighted kernel
title Extracting glycan motifs using a biochemicallyweighted kernel
title_full Extracting glycan motifs using a biochemicallyweighted kernel
title_fullStr Extracting glycan motifs using a biochemicallyweighted kernel
title_full_unstemmed Extracting glycan motifs using a biochemicallyweighted kernel
title_short Extracting glycan motifs using a biochemicallyweighted kernel
title_sort extracting glycan motifs using a biochemicallyweighted kernel
topic Hypothesis
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3280441/
https://www.ncbi.nlm.nih.gov/pubmed/22347783
work_keys_str_mv AT jianghao extractingglycanmotifsusingabiochemicallyweightedkernel
AT aokikinoshitakiyokof extractingglycanmotifsusingabiochemicallyweightedkernel
AT chingwaiki extractingglycanmotifsusingabiochemicallyweightedkernel