Cargando…

“Hybrid Topics” -- Facilitating the Interpretation of Topics Through the Addition of MeSH Descriptors to Bags of Words

Extracting and understanding information, themes and relationships from large collections of documents is an important task for biomedical researchers. Latent Dirichlet Allocation is an unsupervised topic modeling technique using the bag-of-words assumption that has been applied extensively to unvei...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Zhiguo, Nguyen, Thang, Dhombres, Ferdinand, Johnson, Todd, Bodenreider, Olivier
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5875427/
https://www.ncbi.nlm.nih.gov/pubmed/29295179
_version_ 1783310347446255616
author Yu, Zhiguo
Nguyen, Thang
Dhombres, Ferdinand
Johnson, Todd
Bodenreider, Olivier
author_facet Yu, Zhiguo
Nguyen, Thang
Dhombres, Ferdinand
Johnson, Todd
Bodenreider, Olivier
author_sort Yu, Zhiguo
collection PubMed
description Extracting and understanding information, themes and relationships from large collections of documents is an important task for biomedical researchers. Latent Dirichlet Allocation is an unsupervised topic modeling technique using the bag-of-words assumption that has been applied extensively to unveil hidden thematic information within large sets of documents. In this paper, we added MeSH descriptors to the bag-of-words assumption to generate ‘hybrid topics’, which are mixed vectors of words and descriptors. We evaluated this approach on the quality and interpretability of topics in both a general corpus and a specialized corpus. Our results demonstrated that the coherence of ‘hybrid topics’ is higher than that of regular bag-of-words topics in the specialized corpus. We also found that the proportion of topics that are not associated with MeSH descriptors is higher in the specialized corpus than in the general corpus.
format Online
Article
Text
id pubmed-5875427
institution National Center for Biotechnology Information
language English
publishDate 2017
record_format MEDLINE/PubMed
spelling pubmed-58754272018-03-29 “Hybrid Topics” -- Facilitating the Interpretation of Topics Through the Addition of MeSH Descriptors to Bags of Words Yu, Zhiguo Nguyen, Thang Dhombres, Ferdinand Johnson, Todd Bodenreider, Olivier Stud Health Technol Inform Article Extracting and understanding information, themes and relationships from large collections of documents is an important task for biomedical researchers. Latent Dirichlet Allocation is an unsupervised topic modeling technique using the bag-of-words assumption that has been applied extensively to unveil hidden thematic information within large sets of documents. In this paper, we added MeSH descriptors to the bag-of-words assumption to generate ‘hybrid topics’, which are mixed vectors of words and descriptors. We evaluated this approach on the quality and interpretability of topics in both a general corpus and a specialized corpus. Our results demonstrated that the coherence of ‘hybrid topics’ is higher than that of regular bag-of-words topics in the specialized corpus. We also found that the proportion of topics that are not associated with MeSH descriptors is higher in the specialized corpus than in the general corpus. 2017 /pmc/articles/PMC5875427/ /pubmed/29295179 Text en http://creativecommons.org/licenses/by-nc/4.0/ This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
spellingShingle Article
Yu, Zhiguo
Nguyen, Thang
Dhombres, Ferdinand
Johnson, Todd
Bodenreider, Olivier
“Hybrid Topics” -- Facilitating the Interpretation of Topics Through the Addition of MeSH Descriptors to Bags of Words
title “Hybrid Topics” -- Facilitating the Interpretation of Topics Through the Addition of MeSH Descriptors to Bags of Words
title_full “Hybrid Topics” -- Facilitating the Interpretation of Topics Through the Addition of MeSH Descriptors to Bags of Words
title_fullStr “Hybrid Topics” -- Facilitating the Interpretation of Topics Through the Addition of MeSH Descriptors to Bags of Words
title_full_unstemmed “Hybrid Topics” -- Facilitating the Interpretation of Topics Through the Addition of MeSH Descriptors to Bags of Words
title_short “Hybrid Topics” -- Facilitating the Interpretation of Topics Through the Addition of MeSH Descriptors to Bags of Words
title_sort “hybrid topics” -- facilitating the interpretation of topics through the addition of mesh descriptors to bags of words
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5875427/
https://www.ncbi.nlm.nih.gov/pubmed/29295179
work_keys_str_mv AT yuzhiguo hybridtopicsfacilitatingtheinterpretationoftopicsthroughtheadditionofmeshdescriptorstobagsofwords
AT nguyenthang hybridtopicsfacilitatingtheinterpretationoftopicsthroughtheadditionofmeshdescriptorstobagsofwords
AT dhombresferdinand hybridtopicsfacilitatingtheinterpretationoftopicsthroughtheadditionofmeshdescriptorstobagsofwords
AT johnsontodd hybridtopicsfacilitatingtheinterpretationoftopicsthroughtheadditionofmeshdescriptorstobagsofwords
AT bodenreiderolivier hybridtopicsfacilitatingtheinterpretationoftopicsthroughtheadditionofmeshdescriptorstobagsofwords