Cargando…

Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML

Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and d...

Descripción completa

Detalles Bibliográficos
Autores principales: Savci, Pinar, Das, Bihter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10176029/
https://www.ncbi.nlm.nih.gov/pubmed/37187909
http://dx.doi.org/10.1016/j.heliyon.2023.e15670
_version_ 1785040346494795776
author Savci, Pinar
Das, Bihter
author_facet Savci, Pinar
Das, Bihter
author_sort Savci, Pinar
collection PubMed
description Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and difficult. In this study, the performances of pre-trained language models for multi-text classification using Autotrain were compared in a 250 K Turkish dataset that we created. The results showed that the BERTurk (uncased, 128 k) language model on the dataset showed higher accuracy performance with a training time of 66 min compared to the other models and the CO2 emission was quite low. The ConvBERTurk mC4 (uncased) model is also the best-performing second language model. As a result of this study, we have provided a deeper understanding of the capabilities of pre-trained language models for Turkish on machine learning.
format Online
Article
Text
id pubmed-10176029
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-101760292023-05-13 Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML Savci, Pinar Das, Bihter Heliyon Research Article Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and difficult. In this study, the performances of pre-trained language models for multi-text classification using Autotrain were compared in a 250 K Turkish dataset that we created. The results showed that the BERTurk (uncased, 128 k) language model on the dataset showed higher accuracy performance with a training time of 66 min compared to the other models and the CO2 emission was quite low. The ConvBERTurk mC4 (uncased) model is also the best-performing second language model. As a result of this study, we have provided a deeper understanding of the capabilities of pre-trained language models for Turkish on machine learning. Elsevier 2023-05-01 /pmc/articles/PMC10176029/ /pubmed/37187909 http://dx.doi.org/10.1016/j.heliyon.2023.e15670 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Savci, Pinar
Das, Bihter
Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title_full Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title_fullStr Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title_full_unstemmed Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title_short Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
title_sort comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using automl
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10176029/
https://www.ncbi.nlm.nih.gov/pubmed/37187909
http://dx.doi.org/10.1016/j.heliyon.2023.e15670
work_keys_str_mv AT savcipinar comparisonofpretrainedlanguagemodelsintermsofcarbonemissionstimeandaccuracyinmultilabeltextclassificationusingautoml
AT dasbihter comparisonofpretrainedlanguagemodelsintermsofcarbonemissionstimeandaccuracyinmultilabeltextclassificationusingautoml