Cargando…

Methodological proposal to identify the nationality of Twitter users through random-forests

We disclose a methodology to determine the participants in discussions and their contributions in social networks with a local relationship (e.g., nationality), providing certain levels of trust and efficiency in the process. The dynamic is a challenge that has demanded studies and some approximatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Quijano, Damián, Gil-Herrera, Richard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9888676/
https://www.ncbi.nlm.nih.gov/pubmed/36719891
http://dx.doi.org/10.1371/journal.pone.0277858
_version_ 1784880572362915840
author Quijano, Damián
Gil-Herrera, Richard
author_facet Quijano, Damián
Gil-Herrera, Richard
author_sort Quijano, Damián
collection PubMed
description We disclose a methodology to determine the participants in discussions and their contributions in social networks with a local relationship (e.g., nationality), providing certain levels of trust and efficiency in the process. The dynamic is a challenge that has demanded studies and some approximations to recent solutions. The study addressed the problem of identifying the nationality of users in the Twitter social network before an opinion request (of a political nature and social participation). The employed methodology classifies, via machine learning, the Twitter users’ nationality to carry out opinion studies in three Central American countries. The Random Forests algorithm is used to generate classification models with small training samples, using exclusively numerical characteristics based on the number of times that different interactions among users occur. When averaging the proportions achieved by inferences of the ratio of nationals of each country, in the initial data, an average of 77.40% was calculated, compared to 91.60% averaged after applying the automatic classification model, an average increase of 14.20%. In conclusion, it can be seen that the suggested set of method provides a reasonable approach and efficiency in the face of opinion problems.
format Online
Article
Text
id pubmed-9888676
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-98886762023-02-01 Methodological proposal to identify the nationality of Twitter users through random-forests Quijano, Damián Gil-Herrera, Richard PLoS One Research Article We disclose a methodology to determine the participants in discussions and their contributions in social networks with a local relationship (e.g., nationality), providing certain levels of trust and efficiency in the process. The dynamic is a challenge that has demanded studies and some approximations to recent solutions. The study addressed the problem of identifying the nationality of users in the Twitter social network before an opinion request (of a political nature and social participation). The employed methodology classifies, via machine learning, the Twitter users’ nationality to carry out opinion studies in three Central American countries. The Random Forests algorithm is used to generate classification models with small training samples, using exclusively numerical characteristics based on the number of times that different interactions among users occur. When averaging the proportions achieved by inferences of the ratio of nationals of each country, in the initial data, an average of 77.40% was calculated, compared to 91.60% averaged after applying the automatic classification model, an average increase of 14.20%. In conclusion, it can be seen that the suggested set of method provides a reasonable approach and efficiency in the face of opinion problems. Public Library of Science 2023-01-31 /pmc/articles/PMC9888676/ /pubmed/36719891 http://dx.doi.org/10.1371/journal.pone.0277858 Text en © 2023 Quijano, Gil-Herrera https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Quijano, Damián
Gil-Herrera, Richard
Methodological proposal to identify the nationality of Twitter users through random-forests
title Methodological proposal to identify the nationality of Twitter users through random-forests
title_full Methodological proposal to identify the nationality of Twitter users through random-forests
title_fullStr Methodological proposal to identify the nationality of Twitter users through random-forests
title_full_unstemmed Methodological proposal to identify the nationality of Twitter users through random-forests
title_short Methodological proposal to identify the nationality of Twitter users through random-forests
title_sort methodological proposal to identify the nationality of twitter users through random-forests
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9888676/
https://www.ncbi.nlm.nih.gov/pubmed/36719891
http://dx.doi.org/10.1371/journal.pone.0277858
work_keys_str_mv AT quijanodamian methodologicalproposaltoidentifythenationalityoftwitterusersthroughrandomforests
AT gilherrerarichard methodologicalproposaltoidentifythenationalityoftwitterusersthroughrandomforests