Cargando…

Proposed Model for Context Topic Identification of English and Hindi News Article Through LDA Approach with NLP Technique

According to the survey, India has the world's second-largest newspaper market, with more than 100 K newspaper outlets, approx 240 million circulation, and 1300 million subscribers or readers. The topic modeling work is increasing day by day, and researchers have published multiple topic modeli...

Descripción completa

Detalles Bibliográficos
Autores principales: Srivastav, Anukriti, Singh, Satwinder
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer India 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8363495/
http://dx.doi.org/10.1007/s40031-021-00655-w
_version_ 1783738365398482944
author Srivastav, Anukriti
Singh, Satwinder
author_facet Srivastav, Anukriti
Singh, Satwinder
author_sort Srivastav, Anukriti
collection PubMed
description According to the survey, India has the world's second-largest newspaper market, with more than 100 K newspaper outlets, approx 240 million circulation, and 1300 million subscribers or readers. The topic modeling work is increasing day by day, and researchers have published multiple topic modeling papers and have implemented them in different areas like software engineering, political science and medical, etc. LDA topic modeling is used in this research because it has been introduced successfully for topic modeling and classification and it measures the probability of a text-dependent on the bag-of-words scheme without considering the word series. LDA is a common topic modeling algorithm with excellent implementation in the Gensim Python package. However, the challenge is how to extract good quality topics that are simple, separated, and meaningful. The purpose of this research deals with finding the main topics of the same category news articles which are in two different languages (Hindi and English) and then classifying these different language news topics with similarity measurement. In this research, the corpus is constructed with bigram. To achieve the research goal, we have to first build a headline and link extractor that scrap the top news from Google News feeds for both English and Hindi languages (Google News collects news stories that have appeared on different news website which is already accessible in 35 languages over the last 30 days) and then analyses which two news headlines are similar.
format Online
Article
Text
id pubmed-8363495
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer India
record_format MEDLINE/PubMed
spelling pubmed-83634952021-08-15 Proposed Model for Context Topic Identification of English and Hindi News Article Through LDA Approach with NLP Technique Srivastav, Anukriti Singh, Satwinder J. Inst. Eng. India Ser. B Original Contribution According to the survey, India has the world's second-largest newspaper market, with more than 100 K newspaper outlets, approx 240 million circulation, and 1300 million subscribers or readers. The topic modeling work is increasing day by day, and researchers have published multiple topic modeling papers and have implemented them in different areas like software engineering, political science and medical, etc. LDA topic modeling is used in this research because it has been introduced successfully for topic modeling and classification and it measures the probability of a text-dependent on the bag-of-words scheme without considering the word series. LDA is a common topic modeling algorithm with excellent implementation in the Gensim Python package. However, the challenge is how to extract good quality topics that are simple, separated, and meaningful. The purpose of this research deals with finding the main topics of the same category news articles which are in two different languages (Hindi and English) and then classifying these different language news topics with similarity measurement. In this research, the corpus is constructed with bigram. To achieve the research goal, we have to first build a headline and link extractor that scrap the top news from Google News feeds for both English and Hindi languages (Google News collects news stories that have appeared on different news website which is already accessible in 35 languages over the last 30 days) and then analyses which two news headlines are similar. Springer India 2021-08-14 2022 /pmc/articles/PMC8363495/ http://dx.doi.org/10.1007/s40031-021-00655-w Text en © The Institution of Engineers (India) 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Contribution
Srivastav, Anukriti
Singh, Satwinder
Proposed Model for Context Topic Identification of English and Hindi News Article Through LDA Approach with NLP Technique
title Proposed Model for Context Topic Identification of English and Hindi News Article Through LDA Approach with NLP Technique
title_full Proposed Model for Context Topic Identification of English and Hindi News Article Through LDA Approach with NLP Technique
title_fullStr Proposed Model for Context Topic Identification of English and Hindi News Article Through LDA Approach with NLP Technique
title_full_unstemmed Proposed Model for Context Topic Identification of English and Hindi News Article Through LDA Approach with NLP Technique
title_short Proposed Model for Context Topic Identification of English and Hindi News Article Through LDA Approach with NLP Technique
title_sort proposed model for context topic identification of english and hindi news article through lda approach with nlp technique
topic Original Contribution
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8363495/
http://dx.doi.org/10.1007/s40031-021-00655-w
work_keys_str_mv AT srivastavanukriti proposedmodelforcontexttopicidentificationofenglishandhindinewsarticlethroughldaapproachwithnlptechnique
AT singhsatwinder proposedmodelforcontexttopicidentificationofenglishandhindinewsarticlethroughldaapproachwithnlptechnique