Cargando…

Word2vec convolutional neural networks for classification of news articles and tweets

Big web data from sources including online news and Twitter are good resources for investigating deep learning. However, collected news articles and tweets almost certainly contain data unnecessary for learning, and this disturbs accurate learning. This paper explores the performance of word2vec Con...

Descripción completa

Detalles Bibliográficos
Autores principales: Jang, Beakcheol, Kim, Inhwan, Kim, Jong Wook
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6705863/
https://www.ncbi.nlm.nih.gov/pubmed/31437181
http://dx.doi.org/10.1371/journal.pone.0220976
_version_ 1783445640211070976
author Jang, Beakcheol
Kim, Inhwan
Kim, Jong Wook
author_facet Jang, Beakcheol
Kim, Inhwan
Kim, Jong Wook
author_sort Jang, Beakcheol
collection PubMed
description Big web data from sources including online news and Twitter are good resources for investigating deep learning. However, collected news articles and tweets almost certainly contain data unnecessary for learning, and this disturbs accurate learning. This paper explores the performance of word2vec Convolutional Neural Networks (CNNs) to classify news articles and tweets into related and unrelated ones. Using two word embedding algorithms of word2vec, Continuous Bag-of-Word (CBOW) and Skip-gram, we constructed CNN with the CBOW model and CNN with the Skip-gram model. We measured the classification accuracy of CNN with CBOW, CNN with Skip-gram, and CNN without word2vec models for real news articles and tweets. The experimental results indicated that word2vec significantly improved the accuracy of the classification model. The accuracy of the CBOW model was higher and more stable when compared to that of the Skip-gram model. The CBOW model exhibited better performance on news articles, and the Skip-gram model exhibited better performance on tweets. Specifically, CNN with word2vec models was more effective on news articles when compared to that on tweets because news articles are typically more uniform when compared to tweets.
format Online
Article
Text
id pubmed-6705863
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-67058632019-09-04 Word2vec convolutional neural networks for classification of news articles and tweets Jang, Beakcheol Kim, Inhwan Kim, Jong Wook PLoS One Research Article Big web data from sources including online news and Twitter are good resources for investigating deep learning. However, collected news articles and tweets almost certainly contain data unnecessary for learning, and this disturbs accurate learning. This paper explores the performance of word2vec Convolutional Neural Networks (CNNs) to classify news articles and tweets into related and unrelated ones. Using two word embedding algorithms of word2vec, Continuous Bag-of-Word (CBOW) and Skip-gram, we constructed CNN with the CBOW model and CNN with the Skip-gram model. We measured the classification accuracy of CNN with CBOW, CNN with Skip-gram, and CNN without word2vec models for real news articles and tweets. The experimental results indicated that word2vec significantly improved the accuracy of the classification model. The accuracy of the CBOW model was higher and more stable when compared to that of the Skip-gram model. The CBOW model exhibited better performance on news articles, and the Skip-gram model exhibited better performance on tweets. Specifically, CNN with word2vec models was more effective on news articles when compared to that on tweets because news articles are typically more uniform when compared to tweets. Public Library of Science 2019-08-22 /pmc/articles/PMC6705863/ /pubmed/31437181 http://dx.doi.org/10.1371/journal.pone.0220976 Text en © 2019 Jang et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Jang, Beakcheol
Kim, Inhwan
Kim, Jong Wook
Word2vec convolutional neural networks for classification of news articles and tweets
title Word2vec convolutional neural networks for classification of news articles and tweets
title_full Word2vec convolutional neural networks for classification of news articles and tweets
title_fullStr Word2vec convolutional neural networks for classification of news articles and tweets
title_full_unstemmed Word2vec convolutional neural networks for classification of news articles and tweets
title_short Word2vec convolutional neural networks for classification of news articles and tweets
title_sort word2vec convolutional neural networks for classification of news articles and tweets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6705863/
https://www.ncbi.nlm.nih.gov/pubmed/31437181
http://dx.doi.org/10.1371/journal.pone.0220976
work_keys_str_mv AT jangbeakcheol word2vecconvolutionalneuralnetworksforclassificationofnewsarticlesandtweets
AT kiminhwan word2vecconvolutionalneuralnetworksforclassificationofnewsarticlesandtweets
AT kimjongwook word2vecconvolutionalneuralnetworksforclassificationofnewsarticlesandtweets