Cargando…

Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees

Substances that do not degrade over time have proven to be harmful to the environment and are dangerous to living organisms. Being able to predict the biodegradability of substances without costly experiments is useful. Recently, the quantitative structure–activity relationship (QSAR) models have pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Elsayad, Alaa M., Nassef, Ahmed M., Al-Dhaifallah, Mujahed, Elsayad, Khaled A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7763457/
https://www.ncbi.nlm.nih.gov/pubmed/33322123
http://dx.doi.org/10.3390/ijerph17249322
_version_ 1783628023301406720
author Elsayad, Alaa M.
Nassef, Ahmed M.
Al-Dhaifallah, Mujahed
Elsayad, Khaled A.
author_facet Elsayad, Alaa M.
Nassef, Ahmed M.
Al-Dhaifallah, Mujahed
Elsayad, Khaled A.
author_sort Elsayad, Alaa M.
collection PubMed
description Substances that do not degrade over time have proven to be harmful to the environment and are dangerous to living organisms. Being able to predict the biodegradability of substances without costly experiments is useful. Recently, the quantitative structure–activity relationship (QSAR) models have proposed effective solutions to this problem. However, the molecular descriptor datasets usually suffer from the problems of unbalanced class distribution, which adversely affects the efficiency and generalization of the derived models. Accordingly, this study aims at validating the performances of balanced random trees (RTs) and boosted C5.0 decision trees (DTs) to construct QSAR models to classify the ready biodegradation of substances and their abilities to deal with unbalanced data. The balanced RTs model algorithm builds individual trees using balanced bootstrap samples, while the boosted C5.0 DT is modeled using cost-sensitive learning. We employed the two-dimensional molecular descriptor dataset, which is publicly available through the University of California, Irvine (UCI) machine learning repository. The molecular descriptors were ranked according to their contributions to the balanced RTs classification process. The performance of the proposed models was compared with previously reported results. Based on the statistical measures, the experimental results showed that the proposed models outperform the classification results of the support vector machine (SVM), K-nearest neighbors (KNN), and discrimination analysis (DA). Classification measures were analyzed in terms of accuracy, sensitivity, specificity, precision, false positive rate, false negative rate, F1 score, receiver operating characteristic (ROC) curve, and area under the ROC curve (AUROC).
format Online
Article
Text
id pubmed-7763457
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-77634572020-12-27 Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees Elsayad, Alaa M. Nassef, Ahmed M. Al-Dhaifallah, Mujahed Elsayad, Khaled A. Int J Environ Res Public Health Article Substances that do not degrade over time have proven to be harmful to the environment and are dangerous to living organisms. Being able to predict the biodegradability of substances without costly experiments is useful. Recently, the quantitative structure–activity relationship (QSAR) models have proposed effective solutions to this problem. However, the molecular descriptor datasets usually suffer from the problems of unbalanced class distribution, which adversely affects the efficiency and generalization of the derived models. Accordingly, this study aims at validating the performances of balanced random trees (RTs) and boosted C5.0 decision trees (DTs) to construct QSAR models to classify the ready biodegradation of substances and their abilities to deal with unbalanced data. The balanced RTs model algorithm builds individual trees using balanced bootstrap samples, while the boosted C5.0 DT is modeled using cost-sensitive learning. We employed the two-dimensional molecular descriptor dataset, which is publicly available through the University of California, Irvine (UCI) machine learning repository. The molecular descriptors were ranked according to their contributions to the balanced RTs classification process. The performance of the proposed models was compared with previously reported results. Based on the statistical measures, the experimental results showed that the proposed models outperform the classification results of the support vector machine (SVM), K-nearest neighbors (KNN), and discrimination analysis (DA). Classification measures were analyzed in terms of accuracy, sensitivity, specificity, precision, false positive rate, false negative rate, F1 score, receiver operating characteristic (ROC) curve, and area under the ROC curve (AUROC). MDPI 2020-12-13 2020-12 /pmc/articles/PMC7763457/ /pubmed/33322123 http://dx.doi.org/10.3390/ijerph17249322 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Elsayad, Alaa M.
Nassef, Ahmed M.
Al-Dhaifallah, Mujahed
Elsayad, Khaled A.
Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees
title Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees
title_full Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees
title_fullStr Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees
title_full_unstemmed Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees
title_short Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees
title_sort classification of biodegradable substances using balanced random trees and boosted c5.0 decision trees
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7763457/
https://www.ncbi.nlm.nih.gov/pubmed/33322123
http://dx.doi.org/10.3390/ijerph17249322
work_keys_str_mv AT elsayadalaam classificationofbiodegradablesubstancesusingbalancedrandomtreesandboostedc50decisiontrees
AT nassefahmedm classificationofbiodegradablesubstancesusingbalancedrandomtreesandboostedc50decisiontrees
AT aldhaifallahmujahed classificationofbiodegradablesubstancesusingbalancedrandomtreesandboostedc50decisiontrees
AT elsayadkhaleda classificationofbiodegradablesubstancesusingbalancedrandomtreesandboostedc50decisiontrees