Cargando…

A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction

With the continuous accumulation of biological data, more and more machine learning algorithms have been introduced into the field of gene function prediction, which has great significance in decoding the secret of life. Recently, a multi-label supervised topic model named labeled latent Dirichlet a...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Lin, Tang, Lin, Jin, Xin, Zhou, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6356783/
https://www.ncbi.nlm.nih.gov/pubmed/30658497
http://dx.doi.org/10.3390/genes10010057
_version_ 1783391635612106752
author Liu, Lin
Tang, Lin
Jin, Xin
Zhou, Wei
author_facet Liu, Lin
Tang, Lin
Jin, Xin
Zhou, Wei
author_sort Liu, Lin
collection PubMed
description With the continuous accumulation of biological data, more and more machine learning algorithms have been introduced into the field of gene function prediction, which has great significance in decoding the secret of life. Recently, a multi-label supervised topic model named labeled latent Dirichlet allocation (LLDA) has been applied to gene function prediction, and obtained more accurate and explainable predictions than conventional methods. Nonetheless, the LLDA model is only able to construct a bag of amino acid words as a classification feature, and does not support any other features, such as hydrophobicity, which has a profound impact on gene function. To achieve more accurate probabilistic modeling of gene function, we propose a multi-label supervised topic model conditioned on arbitrary features, named Dirichlet multinomial regression LLDA (DMR-LLDA), for introducing multiple types of features into the process of topic modeling. Based on DMR framework, DMR-LLDA applies an exponential a priori construction, previously with weighted features, on the hyper-parameters of gene-topic distribution, so as to reflect the effects of extra features on function probability distribution. In the five-fold cross validation experiment of a yeast datasets, DMR-LLDA outperforms the compared model significantly. All of these experiments demonstrate the effectiveness and potential value of DMR-LLDA for predicting gene function.
format Online
Article
Text
id pubmed-6356783
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-63567832019-02-04 A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction Liu, Lin Tang, Lin Jin, Xin Zhou, Wei Genes (Basel) Article With the continuous accumulation of biological data, more and more machine learning algorithms have been introduced into the field of gene function prediction, which has great significance in decoding the secret of life. Recently, a multi-label supervised topic model named labeled latent Dirichlet allocation (LLDA) has been applied to gene function prediction, and obtained more accurate and explainable predictions than conventional methods. Nonetheless, the LLDA model is only able to construct a bag of amino acid words as a classification feature, and does not support any other features, such as hydrophobicity, which has a profound impact on gene function. To achieve more accurate probabilistic modeling of gene function, we propose a multi-label supervised topic model conditioned on arbitrary features, named Dirichlet multinomial regression LLDA (DMR-LLDA), for introducing multiple types of features into the process of topic modeling. Based on DMR framework, DMR-LLDA applies an exponential a priori construction, previously with weighted features, on the hyper-parameters of gene-topic distribution, so as to reflect the effects of extra features on function probability distribution. In the five-fold cross validation experiment of a yeast datasets, DMR-LLDA outperforms the compared model significantly. All of these experiments demonstrate the effectiveness and potential value of DMR-LLDA for predicting gene function. MDPI 2019-01-17 /pmc/articles/PMC6356783/ /pubmed/30658497 http://dx.doi.org/10.3390/genes10010057 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Liu, Lin
Tang, Lin
Jin, Xin
Zhou, Wei
A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction
title A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction
title_full A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction
title_fullStr A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction
title_full_unstemmed A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction
title_short A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction
title_sort multi-label supervised topic model conditioned on arbitrary features for gene function prediction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6356783/
https://www.ncbi.nlm.nih.gov/pubmed/30658497
http://dx.doi.org/10.3390/genes10010057
work_keys_str_mv AT liulin amultilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction
AT tanglin amultilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction
AT jinxin amultilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction
AT zhouwei amultilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction
AT liulin multilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction
AT tanglin multilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction
AT jinxin multilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction
AT zhouwei multilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction