Cargando…

Enhancing text pre-processing for Swahili language: Datasets for common Swahili stop-words, slangs and typos with equivalent proper words

Natural Language Processing requires data to be pre-processed to guarantee quality models in different machine learning tasks. However, Swahili language have been disadvantaged and is classified as low resource language because of inadequate data for NLP especially basic textual datasets that are us...

Descripción completa

Detalles Bibliográficos
Autores principales:	Masua, Bernard, Masasi, Noel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2020
Materias:	Data Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7689026/ https://www.ncbi.nlm.nih.gov/pubmed/33294515 http://dx.doi.org/10.1016/j.dib.2020.106517

Ejemplares similares

Say it in swahili
por: Zawawi, M Sharifa
Publicado: (1972)

Words and Slang
por: Chandler, J. B.
Publicado: (1881)

Dataset of Karakalpak language stop words
por: Madatov, Khabibulla, et al.
Publicado: (2023)

When Did the Swahili Become Maritime?
por: Fleisher, Jeffrey, et al.
Publicado: (2015)

Enhancing African low-resource languages: Swahili data for language modelling
por: Shikali, Casper S., et al.
Publicado: (2020)

Translation and adaptation of the stroke-specific quality of life scale into Swahili
por: Nyanumba, Emily M., et al.
Publicado: (2023)

Entwined African and Asian genetic roots of medieval peoples of the Swahili coast
por: Brielle, Esther S., et al.
Publicado: (2023)

Dietary Diversity on the Swahili Coast: The Fauna from Two Zanzibar Trading Locales
por: Prendergast, M. E., et al.
Publicado: (2017)

Cross-Culture Adaptation and Psychometric Properties of the DrInC Questionnaire in Tanzanian Swahili
por: Zhao, Duan, et al.
Publicado: (2018)

MyWSL: Malaysian words sign language dataset
por: Johari, Rina Tasia, et al.
Publicado: (2023)

BdSLW-11: Dataset of Bangladeshi sign language words for recognizing 11 daily useful BdSL words
por: Islam, Md. Monirul, et al.
Publicado: (2022)

Minyoo Matata – The Vicious Worm – A Taenia solium Computer-Based Health-Education Tool – in Swahili
por: Trevisan, Chiara, et al.
Publicado: (2017)

The TYPO3 Guidebook: Understand and Use TYPO3 CMS
por: Brand, Felicity, et al.
Publicado: (2021)

A Qualitative Exploration of Sources of Help for Mental Illness in Arabic-, Mandarin-, and Swahili-Speaking Communities in Sydney, Australia
por: Krstanoska-Blazeska, Klimentina, et al.
Publicado: (2023)

Swahili translation and cultural adaptation of the pediatric patient-reported outcomes version of the common terminology criteria for adverse events (PRO-CTCAE)
por: Schroeder, Kristin M., et al.
Publicado: (2023)

Good Slang or Bad Slang? Embedding Internet Slang in Persuasive Advertising
por: Liu, Shixiong, et al.
Publicado: (2019)

Adaptation and Latent Structure of the Swahili Version of Beck Depression Inventory-II in a Low Literacy Population in the Context of HIV
por: Abubakar, Amina, et al.
Publicado: (2016)

Fear, faith and finances: health literacy experiences of English and Swahili speaking women newly diagnosed with breast and cervical cancer
por: Kassaman, Dinah, et al.
Publicado: (2022)

Validation of a culturally sensitive, Swahili-translated instrument to assess suicide risk among adults living with HIV in Tanzania
por: Minja, Linda, et al.
Publicado: (2023)

Adapting and usability testing of the Kansas city cardiomyopathy questionnaire (KCCQ) in a heart failure clinic in Tanzania: the Swahili KCCQ
por: Chillo, Pilly, et al.
Publicado: (2023)

TYPO3 Templates
por: Greenawalt, Jeremy
Publicado: (2010)

Cross-cultural adaptation and psychometric properties of the MMSE and MoCA questionnaires in Tanzanian Swahili for a traumatic brain injury population
por: Vissoci, Joao Ricardo Nickenig, et al.
Publicado: (2019)

Development and Validation of a Cross-Cultural Knowledge, Attitudes, and Practices Survey Instrument for Chronic Kidney Disease in a Swahili-Speaking Population
por: Stanifer, John W., et al.
Publicado: (2015)

Cross-cultural adaptation and psychometric properties of the Kessler Scale of Psychological Distress to a traumatic brain injury population in Swahili and the Tanzanian Setting
por: Vissoci, Joao Ricardo Nickenig, et al.
Publicado: (2018)

A mixed methods approach to adapting and evaluating the functional assessment of HIV infection (FAHI), Swahili version, for use with low literacy populations
por: Nyongesa, Moses K., et al.
Publicado: (2017)

Scientific Slang
Publicado: (1902)

Arabic handwritten alphabets, words and paragraphs per user (AHAWP) dataset
por: Khan, Majid Ali
Publicado: (2022)

Dataset for classifying English words into difficulty levels by undergraduate and postgraduate students
por: Kangoo, Nisar Ahmad, et al.
Publicado: (2023)

Swahili translation and validation of the Warwick Edinburgh Mental Wellbeing Scale (WEMWBS) in adolescents and adults taking part in the girls’ education challenge fund project in Tanzania
por: Oyebode, Oyinlola, et al.
Publicado: (2023)

Word-timestamped transcripts of two spoken narrative recall functional neuroimaging datasets
por: Born, Savannah J., et al.
Publicado: (2023)

Contemporary American slang
por: Spears, Richard A.
Publicado: (1991)

Dictionary of American slang
por: Chapman, Robert L., 1920-
Publicado: (1995)

Psychological Underpinning of Slanging
por: Kar, Sujita Kumar, et al.
Publicado: (2020)

Validation of a Swahili version of the World Health Organization 5-item well-being index among adults living with HIV and epilepsy in rural coastal Kenya
por: Chongwo, Esther, et al.
Publicado: (2018)

The Proper Definition of the Word Cure, as Applied to Medicine
por: Eve, Paul F.
Publicado: (1872)

The Proper Definition of the Word “Cure,” as Applied to Medicine
por: Eve, Paul F.
Publicado: (1872)

Equivalent Pairs of Words and Points of Connection
por: Mushtaq, Qaiser, et al.
Publicado: (2014)

WEClustering: word embeddings based text clustering technique for large datasets
por: Mehta, Vivek, et al.
Publicado: (2021)

The pocket dictionary of american slang : A popular abridgment of the dictionary of american slang
Publicado: (1968)

Neural Responses to Novel and Existing Words in Children with Autism Spectrum and Developmental Language Disorder
por: Knowland, Victoria C. P., et al.
Publicado: (2022)

Cannot write session to /tmp/vufind_sessions/sess_4maf9qta2fe90vdror9pfnk56n