Cargando…

WET: Word embedding-topic distribution vectors for MOOC video lectures dataset

In this article, we present a dataset containing word embeddings and document topic distribution vectors generated from MOOCs video lecture transcripts. Transcripts of 12,032 video lectures from 200 courses were collected from Coursera learning platform. This large corpus of transcripts was used as...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kastrati, Zenun, Kurti, Arianit, Imran, Ali Shariq
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2020
Materias:	Computer Science
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6950834/ https://www.ncbi.nlm.nih.gov/pubmed/31921958 http://dx.doi.org/10.1016/j.dib.2019.105090

_version_	1783486162434785280
author	Kastrati, Zenun Kurti, Arianit Imran, Ali Shariq
author_facet	Kastrati, Zenun Kurti, Arianit Imran, Ali Shariq
author_sort	Kastrati, Zenun
collection	PubMed
description	In this article, we present a dataset containing word embeddings and document topic distribution vectors generated from MOOCs video lecture transcripts. Transcripts of 12,032 video lectures from 200 courses were collected from Coursera learning platform. This large corpus of transcripts was used as input to two well-known NLP techniques, namely Word2Vec and Latent Dirichlet Allocation (LDA) to generate word embeddings and topic vectors, respectively. We used Word2Vec and LDA implementation in the Gensim package in Python. The data presented in this article are related to the research article entitled “Integrating word embeddings and document topics with deep learning in a video classification framework” [1]. The dataset is hosted in the Mendeley Data repository [2].
format	Online Article Text
id	pubmed-6950834
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-69508342020-01-09 WET: Word embedding-topic distribution vectors for MOOC video lectures dataset Kastrati, Zenun Kurti, Arianit Imran, Ali Shariq Data Brief Computer Science In this article, we present a dataset containing word embeddings and document topic distribution vectors generated from MOOCs video lecture transcripts. Transcripts of 12,032 video lectures from 200 courses were collected from Coursera learning platform. This large corpus of transcripts was used as input to two well-known NLP techniques, namely Word2Vec and Latent Dirichlet Allocation (LDA) to generate word embeddings and topic vectors, respectively. We used Word2Vec and LDA implementation in the Gensim package in Python. The data presented in this article are related to the research article entitled “Integrating word embeddings and document topics with deep learning in a video classification framework” [1]. The dataset is hosted in the Mendeley Data repository [2]. Elsevier 2020-01-03 /pmc/articles/PMC6950834/ /pubmed/31921958 http://dx.doi.org/10.1016/j.dib.2019.105090 Text en © 2020 The Author(s) http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Computer Science Kastrati, Zenun Kurti, Arianit Imran, Ali Shariq WET: Word embedding-topic distribution vectors for MOOC video lectures dataset
title	WET: Word embedding-topic distribution vectors for MOOC video lectures dataset
title_full	WET: Word embedding-topic distribution vectors for MOOC video lectures dataset
title_fullStr	WET: Word embedding-topic distribution vectors for MOOC video lectures dataset
title_full_unstemmed	WET: Word embedding-topic distribution vectors for MOOC video lectures dataset
title_short	WET: Word embedding-topic distribution vectors for MOOC video lectures dataset
title_sort	wet: word embedding-topic distribution vectors for mooc video lectures dataset
topic	Computer Science
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6950834/ https://www.ncbi.nlm.nih.gov/pubmed/31921958 http://dx.doi.org/10.1016/j.dib.2019.105090
work_keys_str_mv	AT kastratizenun wetwordembeddingtopicdistributionvectorsformoocvideolecturesdataset AT kurtiarianit wetwordembeddingtopicdistributionvectorsformoocvideolecturesdataset AT imranalishariq wetwordembeddingtopicdistributionvectorsformoocvideolecturesdataset

WET: Word embedding-topic distribution vectors for MOOC video lectures dataset

Ejemplares similares