Cargando…

Automatically identifying the function and intent of posts in underground forums

The automatic classification of posts from hacking-related online forums is of potential value for the understanding of user behaviour in social networks relating to cybercrime. We designed annotation schema to label forum posts for three properties: post type, author intent, and addressee. The post...

Descripción completa

Detalles Bibliográficos
Autores principales: Caines, Andrew, Pastrana, Sergio, Hutchings, Alice, Buttery, Paula J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6404783/
https://www.ncbi.nlm.nih.gov/pubmed/30931233
http://dx.doi.org/10.1186/s40163-018-0094-4
_version_ 1783400959812042752
author Caines, Andrew
Pastrana, Sergio
Hutchings, Alice
Buttery, Paula J.
author_facet Caines, Andrew
Pastrana, Sergio
Hutchings, Alice
Buttery, Paula J.
author_sort Caines, Andrew
collection PubMed
description The automatic classification of posts from hacking-related online forums is of potential value for the understanding of user behaviour in social networks relating to cybercrime. We designed annotation schema to label forum posts for three properties: post type, author intent, and addressee. The post type indicates whether the text is a question, a comment, and so on. The author’s intent in writing the post could be positive, negative, moderating discussion, showing gratitude to another user, etc. The addressee of a post tends to be a general audience (e.g. other forum users) or individual users who have already contributed to a threaded discussion. We manually annotated a sample of posts and returned substantial agreement for post type and addressee, and fair agreement for author intent. We trained rule-based (logical) and machine learning (statistical) classification models to predict these labels automatically, and found that a hybrid logical–statistical model performs best for post type and author intent, whereas a purely statistical model is best for addressee. We discuss potential applications for this data, including the analysis of thread conversations in forum data and the identification of key actors within social networks.
format Online
Article
Text
id pubmed-6404783
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-64047832019-03-27 Automatically identifying the function and intent of posts in underground forums Caines, Andrew Pastrana, Sergio Hutchings, Alice Buttery, Paula J. Crime Sci Research The automatic classification of posts from hacking-related online forums is of potential value for the understanding of user behaviour in social networks relating to cybercrime. We designed annotation schema to label forum posts for three properties: post type, author intent, and addressee. The post type indicates whether the text is a question, a comment, and so on. The author’s intent in writing the post could be positive, negative, moderating discussion, showing gratitude to another user, etc. The addressee of a post tends to be a general audience (e.g. other forum users) or individual users who have already contributed to a threaded discussion. We manually annotated a sample of posts and returned substantial agreement for post type and addressee, and fair agreement for author intent. We trained rule-based (logical) and machine learning (statistical) classification models to predict these labels automatically, and found that a hybrid logical–statistical model performs best for post type and author intent, whereas a purely statistical model is best for addressee. We discuss potential applications for this data, including the analysis of thread conversations in forum data and the identification of key actors within social networks. Springer Berlin Heidelberg 2018-11-29 2018 /pmc/articles/PMC6404783/ /pubmed/30931233 http://dx.doi.org/10.1186/s40163-018-0094-4 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Research
Caines, Andrew
Pastrana, Sergio
Hutchings, Alice
Buttery, Paula J.
Automatically identifying the function and intent of posts in underground forums
title Automatically identifying the function and intent of posts in underground forums
title_full Automatically identifying the function and intent of posts in underground forums
title_fullStr Automatically identifying the function and intent of posts in underground forums
title_full_unstemmed Automatically identifying the function and intent of posts in underground forums
title_short Automatically identifying the function and intent of posts in underground forums
title_sort automatically identifying the function and intent of posts in underground forums
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6404783/
https://www.ncbi.nlm.nih.gov/pubmed/30931233
http://dx.doi.org/10.1186/s40163-018-0094-4
work_keys_str_mv AT cainesandrew automaticallyidentifyingthefunctionandintentofpostsinundergroundforums
AT pastranasergio automaticallyidentifyingthefunctionandintentofpostsinundergroundforums
AT hutchingsalice automaticallyidentifyingthefunctionandintentofpostsinundergroundforums
AT butterypaulaj automaticallyidentifyingthefunctionandintentofpostsinundergroundforums