Cargando…
Unicode-8 based linguistics data set of annotated Sindhi text
Sindhi Unicode-8 based linguistics data set is multi-class and multi-featured data set. It is developed to solve the natural languages processing (NLP) and linguistics problems of Sindhi language. The data set presents information on grammatical and morphological structure of Sindhi language text as...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6139473/ https://www.ncbi.nlm.nih.gov/pubmed/30225294 http://dx.doi.org/10.1016/j.dib.2018.05.062 |
_version_ | 1783355521378549760 |
---|---|
author | Dootio, Mazhar Ali Wagan, Asim Imdad |
author_facet | Dootio, Mazhar Ali Wagan, Asim Imdad |
author_sort | Dootio, Mazhar Ali |
collection | PubMed |
description | Sindhi Unicode-8 based linguistics data set is multi-class and multi-featured data set. It is developed to solve the natural languages processing (NLP) and linguistics problems of Sindhi language. The data set presents information on grammatical and morphological structure of Sindhi language text as well as sentiment polarity of Sindhi lexicons. Therefore, data set may be used for information retrieving, machine translation, lexicon analysis, language modeling analysis, grammatical and morphological analysis, Semantic and sentiment analysis. |
format | Online Article Text |
id | pubmed-6139473 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-61394732018-09-17 Unicode-8 based linguistics data set of annotated Sindhi text Dootio, Mazhar Ali Wagan, Asim Imdad Data Brief Computer Science Sindhi Unicode-8 based linguistics data set is multi-class and multi-featured data set. It is developed to solve the natural languages processing (NLP) and linguistics problems of Sindhi language. The data set presents information on grammatical and morphological structure of Sindhi language text as well as sentiment polarity of Sindhi lexicons. Therefore, data set may be used for information retrieving, machine translation, lexicon analysis, language modeling analysis, grammatical and morphological analysis, Semantic and sentiment analysis. Elsevier 2018-05-22 /pmc/articles/PMC6139473/ /pubmed/30225294 http://dx.doi.org/10.1016/j.dib.2018.05.062 Text en © 2018 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Computer Science Dootio, Mazhar Ali Wagan, Asim Imdad Unicode-8 based linguistics data set of annotated Sindhi text |
title | Unicode-8 based linguistics data set of annotated Sindhi text |
title_full | Unicode-8 based linguistics data set of annotated Sindhi text |
title_fullStr | Unicode-8 based linguistics data set of annotated Sindhi text |
title_full_unstemmed | Unicode-8 based linguistics data set of annotated Sindhi text |
title_short | Unicode-8 based linguistics data set of annotated Sindhi text |
title_sort | unicode-8 based linguistics data set of annotated sindhi text |
topic | Computer Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6139473/ https://www.ncbi.nlm.nih.gov/pubmed/30225294 http://dx.doi.org/10.1016/j.dib.2018.05.062 |
work_keys_str_mv | AT dootiomazharali unicode8basedlinguisticsdatasetofannotatedsindhitext AT waganasimimdad unicode8basedlinguisticsdatasetofannotatedsindhitext |