Cargando…

Unicode-8 based linguistics data set of annotated Sindhi text

Sindhi Unicode-8 based linguistics data set is multi-class and multi-featured data set. It is developed to solve the natural languages processing (NLP) and linguistics problems of Sindhi language. The data set presents information on grammatical and morphological structure of Sindhi language text as...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dootio, Mazhar Ali, Wagan, Asim Imdad
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2018
Materias:	Computer Science
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6139473/ https://www.ncbi.nlm.nih.gov/pubmed/30225294 http://dx.doi.org/10.1016/j.dib.2018.05.062

_version_	1783355521378549760
author	Dootio, Mazhar Ali Wagan, Asim Imdad
author_facet	Dootio, Mazhar Ali Wagan, Asim Imdad
author_sort	Dootio, Mazhar Ali
collection	PubMed
description	Sindhi Unicode-8 based linguistics data set is multi-class and multi-featured data set. It is developed to solve the natural languages processing (NLP) and linguistics problems of Sindhi language. The data set presents information on grammatical and morphological structure of Sindhi language text as well as sentiment polarity of Sindhi lexicons. Therefore, data set may be used for information retrieving, machine translation, lexicon analysis, language modeling analysis, grammatical and morphological analysis, Semantic and sentiment analysis.
format	Online Article Text
id	pubmed-6139473
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-61394732018-09-17 Unicode-8 based linguistics data set of annotated Sindhi text Dootio, Mazhar Ali Wagan, Asim Imdad Data Brief Computer Science Sindhi Unicode-8 based linguistics data set is multi-class and multi-featured data set. It is developed to solve the natural languages processing (NLP) and linguistics problems of Sindhi language. The data set presents information on grammatical and morphological structure of Sindhi language text as well as sentiment polarity of Sindhi lexicons. Therefore, data set may be used for information retrieving, machine translation, lexicon analysis, language modeling analysis, grammatical and morphological analysis, Semantic and sentiment analysis. Elsevier 2018-05-22 /pmc/articles/PMC6139473/ /pubmed/30225294 http://dx.doi.org/10.1016/j.dib.2018.05.062 Text en © 2018 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Computer Science Dootio, Mazhar Ali Wagan, Asim Imdad Unicode-8 based linguistics data set of annotated Sindhi text
title	Unicode-8 based linguistics data set of annotated Sindhi text
title_full	Unicode-8 based linguistics data set of annotated Sindhi text
title_fullStr	Unicode-8 based linguistics data set of annotated Sindhi text
title_full_unstemmed	Unicode-8 based linguistics data set of annotated Sindhi text
title_short	Unicode-8 based linguistics data set of annotated Sindhi text
title_sort	unicode-8 based linguistics data set of annotated sindhi text
topic	Computer Science
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6139473/ https://www.ncbi.nlm.nih.gov/pubmed/30225294 http://dx.doi.org/10.1016/j.dib.2018.05.062
work_keys_str_mv	AT dootiomazharali unicode8basedlinguisticsdatasetofannotatedsindhitext AT waganasimimdad unicode8basedlinguisticsdatasetofannotatedsindhitext

Unicode-8 based linguistics data set of annotated Sindhi text

Ejemplares similares