Cargando…
Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata
With the goal of understanding if the information contained in node metadata can help in the task of link weight prediction, we investigate herein whether incorporating it as a similarity feature (referred to as metadata similarity) between end nodes of a link improves the prediction accuracy of com...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9223064/ https://www.ncbi.nlm.nih.gov/pubmed/35741562 http://dx.doi.org/10.3390/e24060842 |
_version_ | 1784733031199670272 |
---|---|
author | Mori, Larissa O’Hara, Kaleigh Pujol, Toyya A. Ventresca, Mario |
author_facet | Mori, Larissa O’Hara, Kaleigh Pujol, Toyya A. Ventresca, Mario |
author_sort | Mori, Larissa |
collection | PubMed |
description | With the goal of understanding if the information contained in node metadata can help in the task of link weight prediction, we investigate herein whether incorporating it as a similarity feature (referred to as metadata similarity) between end nodes of a link improves the prediction accuracy of common supervised machine learning methods. In contrast with previous works, instead of normalizing the link weights, we treat them as count variables representing the number of interactions between end nodes, as this is a natural representation for many datasets in the literature. In this preliminary study, we find no significant evidence that metadata similarity improved the prediction accuracy of the four empirical datasets studied. To further explore the role of node metadata in weight prediction, we synthesized weights to analyze the extreme case where the weights depend solely on the metadata of the end nodes, while encoding different relationships between them using logical operators in the generation process. Under these conditions, the random forest method performed significantly better than other methods in 99.07% of cases, though the prediction accuracy was significantly degraded for the methods analyzed in comparison to the experiments with the original weights. |
format | Online Article Text |
id | pubmed-9223064 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-92230642022-06-24 Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata Mori, Larissa O’Hara, Kaleigh Pujol, Toyya A. Ventresca, Mario Entropy (Basel) Article With the goal of understanding if the information contained in node metadata can help in the task of link weight prediction, we investigate herein whether incorporating it as a similarity feature (referred to as metadata similarity) between end nodes of a link improves the prediction accuracy of common supervised machine learning methods. In contrast with previous works, instead of normalizing the link weights, we treat them as count variables representing the number of interactions between end nodes, as this is a natural representation for many datasets in the literature. In this preliminary study, we find no significant evidence that metadata similarity improved the prediction accuracy of the four empirical datasets studied. To further explore the role of node metadata in weight prediction, we synthesized weights to analyze the extreme case where the weights depend solely on the metadata of the end nodes, while encoding different relationships between them using logical operators in the generation process. Under these conditions, the random forest method performed significantly better than other methods in 99.07% of cases, though the prediction accuracy was significantly degraded for the methods analyzed in comparison to the experiments with the original weights. MDPI 2022-06-18 /pmc/articles/PMC9223064/ /pubmed/35741562 http://dx.doi.org/10.3390/e24060842 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Mori, Larissa O’Hara, Kaleigh Pujol, Toyya A. Ventresca, Mario Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata |
title | Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata |
title_full | Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata |
title_fullStr | Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata |
title_full_unstemmed | Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata |
title_short | Examining Supervised Machine Learning Methods for Integer Link Weight Prediction Using Node Metadata |
title_sort | examining supervised machine learning methods for integer link weight prediction using node metadata |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9223064/ https://www.ncbi.nlm.nih.gov/pubmed/35741562 http://dx.doi.org/10.3390/e24060842 |
work_keys_str_mv | AT morilarissa examiningsupervisedmachinelearningmethodsforintegerlinkweightpredictionusingnodemetadata AT oharakaleigh examiningsupervisedmachinelearningmethodsforintegerlinkweightpredictionusingnodemetadata AT pujoltoyyaa examiningsupervisedmachinelearningmethodsforintegerlinkweightpredictionusingnodemetadata AT ventrescamario examiningsupervisedmachinelearningmethodsforintegerlinkweightpredictionusingnodemetadata |