Cargando…

Medical dataset classification for Kurdish short text over social media

The Facebook application is used as a resource for collecting the comments of this dataset, The dataset consists of 6756 comments to create a Medical Kurdish Dataset (MKD). The samples are comments of users, which are gathered from different posts of pages (Medical, News, Economy, Education, and Spo...

Descripción completa

Detalles Bibliográficos
Autores principales: Saeed, Ari M., Hussein, Shnya R., Ali, Chro M., Rashid, Tarik A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8980624/
https://www.ncbi.nlm.nih.gov/pubmed/35392621
http://dx.doi.org/10.1016/j.dib.2022.108089
_version_ 1784681431740448768
author Saeed, Ari M.
Hussein, Shnya R.
Ali, Chro M.
Rashid, Tarik A.
author_facet Saeed, Ari M.
Hussein, Shnya R.
Ali, Chro M.
Rashid, Tarik A.
author_sort Saeed, Ari M.
collection PubMed
description The Facebook application is used as a resource for collecting the comments of this dataset, The dataset consists of 6756 comments to create a Medical Kurdish Dataset (MKD). The samples are comments of users, which are gathered from different posts of pages (Medical, News, Economy, Education, and Sport). Six steps as a preprocessing technique are performed on the raw dataset to clean and remove noise in the comments by replacing characters. The comments (short text) are labeled for positive class (medical comment) and negative class (non-medical comment) as text classification. The percentage ratio of the negative class is 55% while the positive class is 45%.
format Online
Article
Text
id pubmed-8980624
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-89806242022-04-06 Medical dataset classification for Kurdish short text over social media Saeed, Ari M. Hussein, Shnya R. Ali, Chro M. Rashid, Tarik A. Data Brief Data Article The Facebook application is used as a resource for collecting the comments of this dataset, The dataset consists of 6756 comments to create a Medical Kurdish Dataset (MKD). The samples are comments of users, which are gathered from different posts of pages (Medical, News, Economy, Education, and Sport). Six steps as a preprocessing technique are performed on the raw dataset to clean and remove noise in the comments by replacing characters. The comments (short text) are labeled for positive class (medical comment) and negative class (non-medical comment) as text classification. The percentage ratio of the negative class is 55% while the positive class is 45%. Elsevier 2022-03-23 /pmc/articles/PMC8980624/ /pubmed/35392621 http://dx.doi.org/10.1016/j.dib.2022.108089 Text en © 2022 The Author(s). Published by Elsevier Inc. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
Saeed, Ari M.
Hussein, Shnya R.
Ali, Chro M.
Rashid, Tarik A.
Medical dataset classification for Kurdish short text over social media
title Medical dataset classification for Kurdish short text over social media
title_full Medical dataset classification for Kurdish short text over social media
title_fullStr Medical dataset classification for Kurdish short text over social media
title_full_unstemmed Medical dataset classification for Kurdish short text over social media
title_short Medical dataset classification for Kurdish short text over social media
title_sort medical dataset classification for kurdish short text over social media
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8980624/
https://www.ncbi.nlm.nih.gov/pubmed/35392621
http://dx.doi.org/10.1016/j.dib.2022.108089
work_keys_str_mv AT saeedarim medicaldatasetclassificationforkurdishshorttextoversocialmedia
AT husseinshnyar medicaldatasetclassificationforkurdishshorttextoversocialmedia
AT alichrom medicaldatasetclassificationforkurdishshorttextoversocialmedia
AT rashidtarika medicaldatasetclassificationforkurdishshorttextoversocialmedia