Cargando…
Enhancing text pre-processing for Swahili language: Datasets for common Swahili stop-words, slangs and typos with equivalent proper words
Natural Language Processing requires data to be pre-processed to guarantee quality models in different machine learning tasks. However, Swahili language have been disadvantaged and is classified as low resource language because of inadequate data for NLP especially basic textual datasets that are us...
Autores principales: | Masua, Bernard, Masasi, Noel |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7689026/ https://www.ncbi.nlm.nih.gov/pubmed/33294515 http://dx.doi.org/10.1016/j.dib.2020.106517 |
Ejemplares similares
-
Say it in swahili
por: Zawawi, M Sharifa
Publicado: (1972) -
Words and Slang
por: Chandler, J. B.
Publicado: (1881) -
Dataset of Karakalpak language stop words
por: Madatov, Khabibulla, et al.
Publicado: (2023) -
When Did the Swahili Become Maritime?
por: Fleisher, Jeffrey, et al.
Publicado: (2015) -
Enhancing African low-resource languages: Swahili data for language modelling
por: Shikali, Casper S., et al.
Publicado: (2020)