Cargando…

Unicode-8 based linguistics data set of annotated Sindhi text

Sindhi Unicode-8 based linguistics data set is multi-class and multi-featured data set. It is developed to solve the natural languages processing (NLP) and linguistics problems of Sindhi language. The data set presents information on grammatical and morphological structure of Sindhi language text as...

Descripción completa

Detalles Bibliográficos
Autores principales: Dootio, Mazhar Ali, Wagan, Asim Imdad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6139473/
https://www.ncbi.nlm.nih.gov/pubmed/30225294
http://dx.doi.org/10.1016/j.dib.2018.05.062
_version_ 1783355521378549760
author Dootio, Mazhar Ali
Wagan, Asim Imdad
author_facet Dootio, Mazhar Ali
Wagan, Asim Imdad
author_sort Dootio, Mazhar Ali
collection PubMed
description Sindhi Unicode-8 based linguistics data set is multi-class and multi-featured data set. It is developed to solve the natural languages processing (NLP) and linguistics problems of Sindhi language. The data set presents information on grammatical and morphological structure of Sindhi language text as well as sentiment polarity of Sindhi lexicons. Therefore, data set may be used for information retrieving, machine translation, lexicon analysis, language modeling analysis, grammatical and morphological analysis, Semantic and sentiment analysis.
format Online
Article
Text
id pubmed-6139473
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-61394732018-09-17 Unicode-8 based linguistics data set of annotated Sindhi text Dootio, Mazhar Ali Wagan, Asim Imdad Data Brief Computer Science Sindhi Unicode-8 based linguistics data set is multi-class and multi-featured data set. It is developed to solve the natural languages processing (NLP) and linguistics problems of Sindhi language. The data set presents information on grammatical and morphological structure of Sindhi language text as well as sentiment polarity of Sindhi lexicons. Therefore, data set may be used for information retrieving, machine translation, lexicon analysis, language modeling analysis, grammatical and morphological analysis, Semantic and sentiment analysis. Elsevier 2018-05-22 /pmc/articles/PMC6139473/ /pubmed/30225294 http://dx.doi.org/10.1016/j.dib.2018.05.062 Text en © 2018 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Computer Science
Dootio, Mazhar Ali
Wagan, Asim Imdad
Unicode-8 based linguistics data set of annotated Sindhi text
title Unicode-8 based linguistics data set of annotated Sindhi text
title_full Unicode-8 based linguistics data set of annotated Sindhi text
title_fullStr Unicode-8 based linguistics data set of annotated Sindhi text
title_full_unstemmed Unicode-8 based linguistics data set of annotated Sindhi text
title_short Unicode-8 based linguistics data set of annotated Sindhi text
title_sort unicode-8 based linguistics data set of annotated sindhi text
topic Computer Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6139473/
https://www.ncbi.nlm.nih.gov/pubmed/30225294
http://dx.doi.org/10.1016/j.dib.2018.05.062
work_keys_str_mv AT dootiomazharali unicode8basedlinguisticsdatasetofannotatedsindhitext
AT waganasimimdad unicode8basedlinguisticsdatasetofannotatedsindhitext