Cargando…

A survey on text classification: Practical perspectives on the Italian language

Text Classification methods have been improving at an unparalleled speed in the last decade thanks to the success brought about by deep learning. Historically, state-of-the-art approaches have been developed for and benchmarked against English datasets, while other languages have had to catch up and...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gasparetto, Andrea, Zangari, Alessandro, Marcuzzo, Matteo, Albarelli, Andrea
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9258888/ https://www.ncbi.nlm.nih.gov/pubmed/35793328 http://dx.doi.org/10.1371/journal.pone.0270904

_version_	1784741652207763456
author	Gasparetto, Andrea Zangari, Alessandro Marcuzzo, Matteo Albarelli, Andrea
author_facet	Gasparetto, Andrea Zangari, Alessandro Marcuzzo, Matteo Albarelli, Andrea
author_sort	Gasparetto, Andrea
collection	PubMed
description	Text Classification methods have been improving at an unparalleled speed in the last decade thanks to the success brought about by deep learning. Historically, state-of-the-art approaches have been developed for and benchmarked against English datasets, while other languages have had to catch up and deal with inevitable linguistic challenges. This paper offers a survey with practical and linguistic connotations, showcasing the complications and challenges tied to the application of modern Text Classification algorithms to languages other than English. We engage this subject from the perspective of the Italian language, and we discuss in detail issues related to the scarcity of task-specific datasets, as well as the issues posed by the computational expensiveness of modern approaches. We substantiate this by providing an extensively researched list of available datasets in Italian, comparing it with a similarly sought list for French, which we use for comparison. In order to simulate a real-world practical scenario, we apply a number of representative methods to custom-tailored multilabel classification datasets in Italian, French, and English. We conclude by discussing results, future challenges, and research directions from a linguistically inclusive perspective.
format	Online Article Text
id	pubmed-9258888
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-92588882022-07-07 A survey on text classification: Practical perspectives on the Italian language Gasparetto, Andrea Zangari, Alessandro Marcuzzo, Matteo Albarelli, Andrea PLoS One Research Article Text Classification methods have been improving at an unparalleled speed in the last decade thanks to the success brought about by deep learning. Historically, state-of-the-art approaches have been developed for and benchmarked against English datasets, while other languages have had to catch up and deal with inevitable linguistic challenges. This paper offers a survey with practical and linguistic connotations, showcasing the complications and challenges tied to the application of modern Text Classification algorithms to languages other than English. We engage this subject from the perspective of the Italian language, and we discuss in detail issues related to the scarcity of task-specific datasets, as well as the issues posed by the computational expensiveness of modern approaches. We substantiate this by providing an extensively researched list of available datasets in Italian, comparing it with a similarly sought list for French, which we use for comparison. In order to simulate a real-world practical scenario, we apply a number of representative methods to custom-tailored multilabel classification datasets in Italian, French, and English. We conclude by discussing results, future challenges, and research directions from a linguistically inclusive perspective. Public Library of Science 2022-07-06 /pmc/articles/PMC9258888/ /pubmed/35793328 http://dx.doi.org/10.1371/journal.pone.0270904 Text en © 2022 Gasparetto et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Gasparetto, Andrea Zangari, Alessandro Marcuzzo, Matteo Albarelli, Andrea A survey on text classification: Practical perspectives on the Italian language
title	A survey on text classification: Practical perspectives on the Italian language
title_full	A survey on text classification: Practical perspectives on the Italian language
title_fullStr	A survey on text classification: Practical perspectives on the Italian language
title_full_unstemmed	A survey on text classification: Practical perspectives on the Italian language
title_short	A survey on text classification: Practical perspectives on the Italian language
title_sort	survey on text classification: practical perspectives on the italian language
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9258888/ https://www.ncbi.nlm.nih.gov/pubmed/35793328 http://dx.doi.org/10.1371/journal.pone.0270904
work_keys_str_mv	AT gasparettoandrea asurveyontextclassificationpracticalperspectivesontheitalianlanguage AT zangarialessandro asurveyontextclassificationpracticalperspectivesontheitalianlanguage AT marcuzzomatteo asurveyontextclassificationpracticalperspectivesontheitalianlanguage AT albarelliandrea asurveyontextclassificationpracticalperspectivesontheitalianlanguage AT gasparettoandrea surveyontextclassificationpracticalperspectivesontheitalianlanguage AT zangarialessandro surveyontextclassificationpracticalperspectivesontheitalianlanguage AT marcuzzomatteo surveyontextclassificationpracticalperspectivesontheitalianlanguage AT albarelliandrea surveyontextclassificationpracticalperspectivesontheitalianlanguage

A survey on text classification: Practical perspectives on the Italian language

Ejemplares similares