Cargando…

Dataset of Karakalpak language stop words

The dataset presented in this paper aims to address the challenge of automatic extraction of stop words in Natural Language Processing (NLP) for the low-resource Karakalpak language spoken by approximately two million people in Uzbekistan. To accomplish this, we have created a corpus of 23 Karakalpa...

Descripción completa

Detalles Bibliográficos
Autores principales:	Madatov, Khabibulla, Bekchanov, Shukurla, Vičič, Jernej
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2023
Materias:	Data Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10126844/ https://www.ncbi.nlm.nih.gov/pubmed/37113499 http://dx.doi.org/10.1016/j.dib.2023.109111

Ejemplares similares

Dataset of stopwords extracted from Uzbek texts
por: Madatov, Khabibulla, et al.
Publicado: (2022)

Enhancing text pre-processing for Swahili language: Datasets for common Swahili stop-words, slangs and typos with equivalent proper words
por: Masua, Bernard, et al.
Publicado: (2020)

MyWSL: Malaysian words sign language dataset
por: Johari, Rina Tasia, et al.
Publicado: (2023)

BdSLW-11: Dataset of Bangladeshi sign language words for recognizing 11 daily useful BdSL words
por: Islam, Md. Monirul, et al.
Publicado: (2022)

Data about fall events and ordinary daily activities from a sensorized smart floor
por: Tošić, Aleksandar, et al.
Publicado: (2021)

Pashtu Language Digits Dataset
por: Khan, Rehan Ullah, et al.
Publicado: (2022)

Dataset for classifying English words into difficulty levels by undergraduate and postgraduate students
por: Kangoo, Nisar Ahmad, et al.
Publicado: (2023)

Arabic handwritten alphabets, words and paragraphs per user (AHAWP) dataset
por: Khan, Majid Ali
Publicado: (2022)

Word-timestamped transcripts of two spoken narrative recall functional neuroimaging datasets
por: Born, Savannah J., et al.
Publicado: (2023)

A dataset for plain language adaptation of biomedical abstracts
por: Attal, Kush, et al.
Publicado: (2023)

An EMG dataset for Arabic sign language alphabet letters and numbers
por: Ben Haj Amor, Amina, et al.
Publicado: (2023)

BDSL 49: A comprehensive dataset of Bangla sign language
por: Hasib, Ayman, et al.
Publicado: (2023)

Low resolution thermal imaging dataset of sign language digits
por: Yeduri, Sreenivasa Reddy, et al.
Publicado: (2022)

Neural Responses to Novel and Existing Words in Children with Autism Spectrum and Developmental Language Disorder
por: Knowland, Victoria C. P., et al.
Publicado: (2022)

A behavioural dataset for studying individual differences in language skills
por: Hintz, Florian, et al.
Publicado: (2020)

Psycholinguistic dataset on language use in 1145 novels published in English and Dutch
por: Luoto, Severi, et al.
Publicado: (2020)

Dataset of factors impacting second language learning from Teachers' experience
por: Arigita-García, Amaya, et al.
Publicado: (2021)

BanglaSER: A speech emotion recognition dataset for the Bangla language
por: Das, Rakesh Kumar, et al.
Publicado: (2022)

Human-annotated dataset for social media sentiment analysis for Albanian language
por: Kadriu, Fatbardh, et al.
Publicado: (2022)

A natural language fMRI dataset for voxelwise encoding models
por: LeBel, Amanda, et al.
Publicado: (2023)

Grammars Across Time Analyzed (GATA): a dataset of 52 languages
por: Blum, Frederic, et al.
Publicado: (2023)

A 204-subject multimodal neuroimaging dataset to study language processing
por: Schoffelen, Jan-Mathijs, et al.
Publicado: (2019)

The “Narratives” fMRI dataset for evaluating models of naturalistic language comprehension
por: Nastase, Samuel A., et al.
Publicado: (2021)

A synchronized multimodal neuroimaging dataset for studying brain language processing
por: Wang, Shaonan, et al.
Publicado: (2022)

Security exchange commission forms K-10 filings – Positive and negative word occurrence dataset 1995–2008
por: Staszkiewicz, Piotr, et al.
Publicado: (2022)

UDDIPOK: A reading comprehension based question answering dataset in Bangla language
por: Aurpa, Tanjim Taharat, et al.
Publicado: (2023)

An online multilingual numeral dataset on Devnagari and English languages for pattern recognition research
por: Jabde, Meenal K., et al.
Publicado: (2023)

BTSD: A curated transformation of sentence dataset for text classification in Bangla language
por: Das, Rajesh Kumar, et al.
Publicado: (2023)

A surface electromyography and inertial measurement unit dataset for the Italian Sign Language alphabet
por: Pacifici, Iacopo, et al.
Publicado: (2020)

A test-retest fMRI dataset for motor, language and spatial attention functions
por: Gorgolewski, Krzysztof J, et al.
Publicado: (2013)

Dataset on the relationship between students’ attitude towards, and performance in mathematics word problems, mediated by active learning heuristic problem-solving approach
por: Wakhata, Robert, et al.
Publicado: (2023)

A 10-hour within-participant magnetoencephalography narrative dataset to test models of language comprehension
por: Armeni, Kristijan, et al.
Publicado: (2022)

Dataset of Pakistan Sign Language and Automatic Recognition of Hand Configuration of Urdu Alphabet through Machine Learning
por: Imran, Ali, et al.
Publicado: (2021)

ToxLex_bn: A curated dataset of bangla toxic language derived from Facebook comment
por: Rashid, Mohammad Mamun Or
Publicado: (2022)

Dataset from Code-switching between English and Malay Languages in Malaysian Premier Polytechnics ESL Classrooms
por: Mohamed Mokhtar, Mazlin, et al.
Publicado: (2022)

A longitudinal neuroimaging dataset on language processing in children ages 5, 7, and 9 years old
por: Wang, Jin, et al.
Publicado: (2022)

Dataset on amelogenesis-related genes variants (ENAM and ENAM interacting genes) and on human leukocyte antigen alleles (DQ2 and DQ8) distribution in children with and without molar-incisor hypomineralisation (MIH)
por: Hočevar, Luka, et al.
Publicado: (2020)

BioWordVec, improving biomedical word embeddings with subword information and MeSH
por: Zhang, Yijia, et al.
Publicado: (2019)

SAM 40: Dataset of 40 subject EEG recordings to monitor the induced-stress while performing Stroop color-word test, arithmetic task, and mirror image recognition task
por: Ghosh, Rajdeep, et al.
Publicado: (2022)

Linguistically annotated dataset for four official South African languages with a conjunctive orthography: IsiNdebele, isiXhosa, isiZulu, and Siswati
por: Gaustad, Tanja, et al.
Publicado: (2022)

Cannot write session to /tmp/vufind_sessions/sess_anisbmdrt9ejqqid0lptih87jr