Cargando…
A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction
With the continuous accumulation of biological data, more and more machine learning algorithms have been introduced into the field of gene function prediction, which has great significance in decoding the secret of life. Recently, a multi-label supervised topic model named labeled latent Dirichlet a...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6356783/ https://www.ncbi.nlm.nih.gov/pubmed/30658497 http://dx.doi.org/10.3390/genes10010057 |
_version_ | 1783391635612106752 |
---|---|
author | Liu, Lin Tang, Lin Jin, Xin Zhou, Wei |
author_facet | Liu, Lin Tang, Lin Jin, Xin Zhou, Wei |
author_sort | Liu, Lin |
collection | PubMed |
description | With the continuous accumulation of biological data, more and more machine learning algorithms have been introduced into the field of gene function prediction, which has great significance in decoding the secret of life. Recently, a multi-label supervised topic model named labeled latent Dirichlet allocation (LLDA) has been applied to gene function prediction, and obtained more accurate and explainable predictions than conventional methods. Nonetheless, the LLDA model is only able to construct a bag of amino acid words as a classification feature, and does not support any other features, such as hydrophobicity, which has a profound impact on gene function. To achieve more accurate probabilistic modeling of gene function, we propose a multi-label supervised topic model conditioned on arbitrary features, named Dirichlet multinomial regression LLDA (DMR-LLDA), for introducing multiple types of features into the process of topic modeling. Based on DMR framework, DMR-LLDA applies an exponential a priori construction, previously with weighted features, on the hyper-parameters of gene-topic distribution, so as to reflect the effects of extra features on function probability distribution. In the five-fold cross validation experiment of a yeast datasets, DMR-LLDA outperforms the compared model significantly. All of these experiments demonstrate the effectiveness and potential value of DMR-LLDA for predicting gene function. |
format | Online Article Text |
id | pubmed-6356783 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-63567832019-02-04 A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction Liu, Lin Tang, Lin Jin, Xin Zhou, Wei Genes (Basel) Article With the continuous accumulation of biological data, more and more machine learning algorithms have been introduced into the field of gene function prediction, which has great significance in decoding the secret of life. Recently, a multi-label supervised topic model named labeled latent Dirichlet allocation (LLDA) has been applied to gene function prediction, and obtained more accurate and explainable predictions than conventional methods. Nonetheless, the LLDA model is only able to construct a bag of amino acid words as a classification feature, and does not support any other features, such as hydrophobicity, which has a profound impact on gene function. To achieve more accurate probabilistic modeling of gene function, we propose a multi-label supervised topic model conditioned on arbitrary features, named Dirichlet multinomial regression LLDA (DMR-LLDA), for introducing multiple types of features into the process of topic modeling. Based on DMR framework, DMR-LLDA applies an exponential a priori construction, previously with weighted features, on the hyper-parameters of gene-topic distribution, so as to reflect the effects of extra features on function probability distribution. In the five-fold cross validation experiment of a yeast datasets, DMR-LLDA outperforms the compared model significantly. All of these experiments demonstrate the effectiveness and potential value of DMR-LLDA for predicting gene function. MDPI 2019-01-17 /pmc/articles/PMC6356783/ /pubmed/30658497 http://dx.doi.org/10.3390/genes10010057 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Liu, Lin Tang, Lin Jin, Xin Zhou, Wei A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction |
title | A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction |
title_full | A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction |
title_fullStr | A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction |
title_full_unstemmed | A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction |
title_short | A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction |
title_sort | multi-label supervised topic model conditioned on arbitrary features for gene function prediction |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6356783/ https://www.ncbi.nlm.nih.gov/pubmed/30658497 http://dx.doi.org/10.3390/genes10010057 |
work_keys_str_mv | AT liulin amultilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction AT tanglin amultilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction AT jinxin amultilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction AT zhouwei amultilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction AT liulin multilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction AT tanglin multilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction AT jinxin multilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction AT zhouwei multilabelsupervisedtopicmodelconditionedonarbitraryfeaturesforgenefunctionprediction |