Cargando…

Corpus creation and language identification for code-mixed Indonesian-Javanese-English Tweets

With the massive use of social media today, mixing between languages in social media text is prevalent. In linguistics, the phenomenon of mixing languages is known as code-mixing. The prevalence of code-mixing exposes various concerns and challenges in natural language processing (NLP), including la...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hidayatullah, Ahmad Fathan, Apong, Rosyzie Anna, Lai, Daphne T.C., Qazi, Atika
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2023
Materias:	Computational Linguistics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10319257/ https://www.ncbi.nlm.nih.gov/pubmed/37409088 http://dx.doi.org/10.7717/peerj-cs.1312

Ejemplares similares

A natural language processing based technique for sentiment analysis of college english corpus
por: Xu, Jingjing
Publicado: (2023)

RuSentiTweet: a sentiment analysis dataset of general domain tweets in Russian
por: Smetanin, Sergey
Publicado: (2022)

Multi-label emotion classification of Urdu tweets
por: Ashraf, Noman, et al.
Publicado: (2022)

Stop voicing contrast in American English: Data of individual speakers in trochaic and iambic words in different prosodic structural contexts
por: Kim, Sahyang, et al.
Publicado: (2018)

Coarticulatory vowel nasalization in American English: Data of individual differences in acoustic realization of vowel nasalization as a function of prosodic prominence and boundary
por: Kim, Daejin, et al.
Publicado: (2019)

Event detection in finance using hierarchical clustering algorithms on news and tweets
por: Carta, Salvatore, et al.
Publicado: (2021)

An optimized deep learning approach for suicide detection through Arabic tweets
por: Baghdadi, Nadiah A., et al.
Publicado: (2022)

Natural language inference for Malayalam language using language agnostic sentence representation
por: Renjit, Sara, et al.
Publicado: (2021)

Segmental speech error data elicited at prosodically-defined locations in tongue twisters
por: Beirne, Mary-Beth, et al.
Publicado: (2018)

Graphical user interface for simultaneous profiling of activity patterns in multiple neuronal subclasses
por: Parrish, R. Ryley, et al.
Publicado: (2018)

A visual working memory dataset collection with bootstrap Independent Component Analysis for comparison of electroencephalographic preprocessing pipelines
por: Artoni, Fiorenzo, et al.
Publicado: (2018)

Proteomic and functional data sets on synaptic mitochondria from rats with genetic ablation of Parkin
por: Villeneuve, Lance M., et al.
Publicado: (2018)

Topology of brain functional connectivity networks in posttraumatic stress disorder
por: Akiki, Teddy J., et al.
Publicado: (2018)

Dataset on the EEG time-frequency representation in children with different levels of mathematical achievement
por: González-Garrido, Andrés A., et al.
Publicado: (2018)

Heart rate variability in mental stress: The data reveal regression to the mean
por: Dimitriev, Dimitriy A., et al.
Publicado: (2018)

Dataset of implicit sequence learning of chunking and abstract structures
por: Fu, Qiufang, et al.
Publicado: (2018)

Group analysis data representing the effects of frontopolar transcranial direct current stimulation on the default mode network
por: Ahn, Jeesung, et al.
Publicado: (2018)

Datasets on the production and perception of underlying and epenthetic glottal stops in Maltese
por: Mitterer, Holger, et al.
Publicado: (2020)

Investigating cross-lingual training for offensive language detection
por: Pelicon, Andraž, et al.
Publicado: (2021)

Fake news detection in Urdu language using machine learning
por: Farooq, Muhammad Shoaib, et al.
Publicado: (2023)

Abusive language detection in youtube comments leveraging replies as conversational context
por: Ashraf, Noman, et al.
Publicado: (2021)

Recognition of Urdu sign language: a systematic review of the machine learning classification
por: Zahid, Hira, et al.
Publicado: (2022)

BengSentiLex and BengSwearLex: creating lexicons for sentiment analysis and profanity detection in low-resource Bengali language
por: Sazzed, Salim
Publicado: (2021)

Evaluating named entity recognition tools for extracting social networks from novels
por: Dekker, Niels, et al.
Publicado: (2019)

Identifying Twitter users who repost unreliable news sources with linguistic information
por: Mu, Yida, et al.
Publicado: (2020)

Forma mentis networks map how nursing and engineering students enhance their mindsets about innovation and health during professional growth
por: Stella, Massimo, et al.
Publicado: (2020)

Identifying vulgarity in Bengali social media textual content
por: Sazzed, Salim
Publicado: (2021)

Ask me in your own words: paraphrasing for multitask question answering
por: Hudson, G. Thomas, et al.
Publicado: (2021)

The role of automated evaluation techniques in online professional translator training
por: Munkova, Dasa, et al.
Publicado: (2021)

FrameAxis: characterizing microframe bias and intensity with word embedding
por: Kwak, Haewoon, et al.
Publicado: (2021)

Using of n-grams from morphological tags for fake news classification
por: Kapusta, Jozef, et al.
Publicado: (2021)

Developing and evaluating cybersecurity competencies for students in computing programs
por: Alammari, Abdullah, et al.
Publicado: (2022)

A systematic literature review on spam content detection and classification
por: Kaddoura, Sanaa, et al.
Publicado: (2022)

(Re)shaping online narratives: when bots promote the message of President Trump during his first impeachment
por: Galgoczy, Michael C., et al.
Publicado: (2022)

People’s expectations and experiences of big data collection in the Saudi context
por: Binsawad, Muhammad, et al.
Publicado: (2022)

A Baybayin word recognition system
por: Pino, Rodney, et al.
Publicado: (2021)

Evaluating keyphrase extraction algorithms for finding similar news articles using lexical similarity calculation and semantic relatedness measurement by word embedding
por: Sarwar, Talha Bin, et al.
Publicado: (2022)

Family literacies during the COVID-19 lockdown: Semiotic assemblages and meaning making at home
por: Zhang, Zheng, et al.
Publicado: (2023)

MFEE: a multi-word lexical feature enhancement framework for Chinese geological hazard event extraction
por: Gong, Jie, et al.
Publicado: (2023)

Comprehension of polarity of articles by citation sentiment analysis using TF-IDF and ML classifiers
por: Karim, Musarat, et al.
Publicado: (2022)

Cannot write session to /tmp/vufind_sessions/sess_o7d310eamflk7bse5vlvk80cus