Cargando…

Enhancing text pre-processing for Swahili language: Datasets for common Swahili stop-words, slangs and typos with equivalent proper words

Natural Language Processing requires data to be pre-processed to guarantee quality models in different machine learning tasks. However, Swahili language have been disadvantaged and is classified as low resource language because of inadequate data for NLP especially basic textual datasets that are us...

Descripción completa

Detalles Bibliográficos
Autores principales: Masua, Bernard, Masasi, Noel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7689026/
https://www.ncbi.nlm.nih.gov/pubmed/33294515
http://dx.doi.org/10.1016/j.dib.2020.106517