Cargando…
Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and d...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10176029/ https://www.ncbi.nlm.nih.gov/pubmed/37187909 http://dx.doi.org/10.1016/j.heliyon.2023.e15670 |
_version_ | 1785040346494795776 |
---|---|
author | Savci, Pinar Das, Bihter |
author_facet | Savci, Pinar Das, Bihter |
author_sort | Savci, Pinar |
collection | PubMed |
description | Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and difficult. In this study, the performances of pre-trained language models for multi-text classification using Autotrain were compared in a 250 K Turkish dataset that we created. The results showed that the BERTurk (uncased, 128 k) language model on the dataset showed higher accuracy performance with a training time of 66 min compared to the other models and the CO2 emission was quite low. The ConvBERTurk mC4 (uncased) model is also the best-performing second language model. As a result of this study, we have provided a deeper understanding of the capabilities of pre-trained language models for Turkish on machine learning. |
format | Online Article Text |
id | pubmed-10176029 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-101760292023-05-13 Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML Savci, Pinar Das, Bihter Heliyon Research Article Since Turkish is an agglutinative language and contains reduplication, idiom, and metaphor words, Turkish texts are sources of information with extremely rich meanings. For this reason, the processing and classification of Turkish texts according to their characteristics is both time-consuming and difficult. In this study, the performances of pre-trained language models for multi-text classification using Autotrain were compared in a 250 K Turkish dataset that we created. The results showed that the BERTurk (uncased, 128 k) language model on the dataset showed higher accuracy performance with a training time of 66 min compared to the other models and the CO2 emission was quite low. The ConvBERTurk mC4 (uncased) model is also the best-performing second language model. As a result of this study, we have provided a deeper understanding of the capabilities of pre-trained language models for Turkish on machine learning. Elsevier 2023-05-01 /pmc/articles/PMC10176029/ /pubmed/37187909 http://dx.doi.org/10.1016/j.heliyon.2023.e15670 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article Savci, Pinar Das, Bihter Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML |
title | Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML |
title_full | Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML |
title_fullStr | Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML |
title_full_unstemmed | Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML |
title_short | Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML |
title_sort | comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using automl |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10176029/ https://www.ncbi.nlm.nih.gov/pubmed/37187909 http://dx.doi.org/10.1016/j.heliyon.2023.e15670 |
work_keys_str_mv | AT savcipinar comparisonofpretrainedlanguagemodelsintermsofcarbonemissionstimeandaccuracyinmultilabeltextclassificationusingautoml AT dasbihter comparisonofpretrainedlanguagemodelsintermsofcarbonemissionstimeandaccuracyinmultilabeltextclassificationusingautoml |