Cargando…
Morpheme Matching Based Text Tokenization for a Scarce Resourced Language
Text tokenization is a fundamental pre-processing step for almost all the information processing applications. This task is nontrivial for the scarce resourced languages such as Urdu, as there is inconsistent use of space between words. In this paper a morpheme matching based approach has been propo...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3749178/ https://www.ncbi.nlm.nih.gov/pubmed/23990871 http://dx.doi.org/10.1371/journal.pone.0068178 |