Cargando…

Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata

With the goal of understanding if the information contained in node metadata can help in the task of link weight prediction, we investigate herein whether incorporating it as a similarity feature (referred to as metadata similarity) between end nodes of a link improves the prediction accuracy of com...

Descripción completa

Detalles Bibliográficos
Autores principales: Mori, Larissa, O’Hara, Kaleigh, Pujol, Toyya A., Ventresca, Mario
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9223064/
https://www.ncbi.nlm.nih.gov/pubmed/35741562
http://dx.doi.org/10.3390/e24060842
_version_ 1784733031199670272
author Mori, Larissa
O’Hara, Kaleigh
Pujol, Toyya A.
Ventresca, Mario
author_facet Mori, Larissa
O’Hara, Kaleigh
Pujol, Toyya A.
Ventresca, Mario
author_sort Mori, Larissa
collection PubMed
description With the goal of understanding if the information contained in node metadata can help in the task of link weight prediction, we investigate herein whether incorporating it as a similarity feature (referred to as metadata similarity) between end nodes of a link improves the prediction accuracy of common supervised machine learning methods. In contrast with previous works, instead of normalizing the link weights, we treat them as count variables representing the number of interactions between end nodes, as this is a natural representation for many datasets in the literature. In this preliminary study, we find no significant evidence that metadata similarity improved the prediction accuracy of the four empirical datasets studied. To further explore the role of node metadata in weight prediction, we synthesized weights to analyze the extreme case where the weights depend solely on the metadata of the end nodes, while encoding different relationships between them using logical operators in the generation process. Under these conditions, the random forest method performed significantly better than other methods in 99.07% of cases, though the prediction accuracy was significantly degraded for the methods analyzed in comparison to the experiments with the original weights.
format Online
Article
Text
id pubmed-9223064
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-92230642022-06-24 Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata Mori, Larissa O’Hara, Kaleigh Pujol, Toyya A. Ventresca, Mario Entropy (Basel) Article With the goal of understanding if the information contained in node metadata can help in the task of link weight prediction, we investigate herein whether incorporating it as a similarity feature (referred to as metadata similarity) between end nodes of a link improves the prediction accuracy of common supervised machine learning methods. In contrast with previous works, instead of normalizing the link weights, we treat them as count variables representing the number of interactions between end nodes, as this is a natural representation for many datasets in the literature. In this preliminary study, we find no significant evidence that metadata similarity improved the prediction accuracy of the four empirical datasets studied. To further explore the role of node metadata in weight prediction, we synthesized weights to analyze the extreme case where the weights depend solely on the metadata of the end nodes, while encoding different relationships between them using logical operators in the generation process. Under these conditions, the random forest method performed significantly better than other methods in 99.07% of cases, though the prediction accuracy was significantly degraded for the methods analyzed in comparison to the experiments with the original weights. MDPI 2022-06-18 /pmc/articles/PMC9223064/ /pubmed/35741562 http://dx.doi.org/10.3390/e24060842 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Mori, Larissa
O’Hara, Kaleigh
Pujol, Toyya A.
Ventresca, Mario
Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata
title Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata
title_full Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata
title_fullStr Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata
title_full_unstemmed Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata
title_short Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata
title_sort examining supervised machine learning methods for integer link weight prediction using node metadata
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9223064/
https://www.ncbi.nlm.nih.gov/pubmed/35741562
http://dx.doi.org/10.3390/e24060842
work_keys_str_mv AT morilarissa examiningsupervisedmachinelearningmethodsforintegerlinkweightpredictionusingnodemetadata
AT oharakaleigh examiningsupervisedmachinelearningmethodsforintegerlinkweightpredictionusingnodemetadata
AT pujoltoyyaa examiningsupervisedmachinelearningmethodsforintegerlinkweightpredictionusingnodemetadata
AT ventrescamario examiningsupervisedmachinelearningmethodsforintegerlinkweightpredictionusingnodemetadata