Cargando…

A review of semi-supervised learning for text classification

A huge amount of data is generated daily leading to big data challenges. One of them is related to text mining, especially text classification. To perform this task we usually need a large set of labeled data that can be expensive, time-consuming, or difficult to be obtained. Considering this scenar...

Descripción completa

Detalles Bibliográficos
Autores principales:	Duarte, José Marcio, Berton, Lilian
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Netherlands 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9887265/ https://www.ncbi.nlm.nih.gov/pubmed/36743267 http://dx.doi.org/10.1007/s10462-023-10393-8

_version_	1784880303431483392
author	Duarte, José Marcio Berton, Lilian
author_facet	Duarte, José Marcio Berton, Lilian
author_sort	Duarte, José Marcio
collection	PubMed
description	A huge amount of data is generated daily leading to big data challenges. One of them is related to text mining, especially text classification. To perform this task we usually need a large set of labeled data that can be expensive, time-consuming, or difficult to be obtained. Considering this scenario semi-supervised learning (SSL), the branch of machine learning concerned with using labeled and unlabeled data has expanded in volume and scope. Since no recent survey exists to overview how SSL has been used in text classification, we aim to fill this gap and present an up-to-date review of SSL for text classification. We retrieve 1794 works from the last 5 years from IEEE Xplore, ACM Digital Library, Science Direct, and Springer. Then, 157 articles were selected to be included in this review. We present the application domain, datasets, and languages employed in the works. The text representations and machine learning algorithms. We also summarize and organize the works following a recent taxonomy of SSL. We analyze the percentage of labeled data used, the evaluation metrics, and obtained results. Lastly, we present some limitations and future trends in the area. We aim to provide researchers and practitioners with an outline of the area as well as useful information for their current research.
format	Online Article Text
id	pubmed-9887265
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Springer Netherlands
record_format	MEDLINE/PubMed
spelling	pubmed-98872652023-01-31 A review of semi-supervised learning for text classification Duarte, José Marcio Berton, Lilian Artif Intell Rev Article A huge amount of data is generated daily leading to big data challenges. One of them is related to text mining, especially text classification. To perform this task we usually need a large set of labeled data that can be expensive, time-consuming, or difficult to be obtained. Considering this scenario semi-supervised learning (SSL), the branch of machine learning concerned with using labeled and unlabeled data has expanded in volume and scope. Since no recent survey exists to overview how SSL has been used in text classification, we aim to fill this gap and present an up-to-date review of SSL for text classification. We retrieve 1794 works from the last 5 years from IEEE Xplore, ACM Digital Library, Science Direct, and Springer. Then, 157 articles were selected to be included in this review. We present the application domain, datasets, and languages employed in the works. The text representations and machine learning algorithms. We also summarize and organize the works following a recent taxonomy of SSL. We analyze the percentage of labeled data used, the evaluation metrics, and obtained results. Lastly, we present some limitations and future trends in the area. We aim to provide researchers and practitioners with an outline of the area as well as useful information for their current research. Springer Netherlands 2023-01-31 /pmc/articles/PMC9887265/ /pubmed/36743267 http://dx.doi.org/10.1007/s10462-023-10393-8 Text en © The Author(s), under exclusive licence to Springer Nature B.V. 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Duarte, José Marcio Berton, Lilian A review of semi-supervised learning for text classification
title	A review of semi-supervised learning for text classification
title_full	A review of semi-supervised learning for text classification
title_fullStr	A review of semi-supervised learning for text classification
title_full_unstemmed	A review of semi-supervised learning for text classification
title_short	A review of semi-supervised learning for text classification
title_sort	review of semi-supervised learning for text classification
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9887265/ https://www.ncbi.nlm.nih.gov/pubmed/36743267 http://dx.doi.org/10.1007/s10462-023-10393-8
work_keys_str_mv	AT duartejosemarcio areviewofsemisupervisedlearningfortextclassification AT bertonlilian areviewofsemisupervisedlearningfortextclassification AT duartejosemarcio reviewofsemisupervisedlearningfortextclassification AT bertonlilian reviewofsemisupervisedlearningfortextclassification

A review of semi-supervised learning for text classification

Ejemplares similares