Cargando…

Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using Daphnia magna Transcriptomic Profiles

A wide range of environmental factors heavily impact aquatic ecosystems, in turn, affecting human health. Toxic organic compounds resulting from anthropogenic activity are a source of pollution in aquatic ecosystems. To evaluate these contaminants, current approaches mainly rely on acute and chronic...

Descripción completa

Detalles Bibliográficos
Autores principales: Choi, Tae-June, An, Hyung-Eun, Kim, Chang-Bae
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9503646/
https://www.ncbi.nlm.nih.gov/pubmed/36143479
http://dx.doi.org/10.3390/life12091443
_version_ 1784796017568251904
author Choi, Tae-June
An, Hyung-Eun
Kim, Chang-Bae
author_facet Choi, Tae-June
An, Hyung-Eun
Kim, Chang-Bae
author_sort Choi, Tae-June
collection PubMed
description A wide range of environmental factors heavily impact aquatic ecosystems, in turn, affecting human health. Toxic organic compounds resulting from anthropogenic activity are a source of pollution in aquatic ecosystems. To evaluate these contaminants, current approaches mainly rely on acute and chronic toxicity tests, but cannot provide explicit insights into the causes of toxicity. As an alternative, genome-wide gene expression systems allow the identification of contaminants causing toxicity by monitoring the organisms’ response to toxic substances. In this study, we selected 22 toxic organic compounds, classified as pesticides, herbicides, or industrial chemicals, that induce environmental problems in aquatic ecosystems and affect human-health. To identify toxic organic compounds using gene expression data from Daphnia magna, we evaluated the performance of three machine learning based feature-ranking algorithms (Learning Vector Quantization, Random Forest, and Support Vector Machines with a Linear kernel), and nine classifiers (Linear Discriminant Analysis, Classification And Regression Trees, K-nearest neighbors, Support Vector Machines with a Linear kernel, Random Forest, Boosted C5.0, Gradient Boosting Machine, eXtreme Gradient Boosting with tree, and eXtreme Gradient Boosting with DART booster). Our analysis revealed that a combination of feature selection based on feature-ranking and a random forest classification algorithm had the best model performance, with an accuracy of 95.7%. This is a preliminary study to establish a model for the monitoring of aquatic toxic substances by machine learning. This model could be an effective tool to manage contaminants and toxic organic compounds in aquatic systems.
format Online
Article
Text
id pubmed-9503646
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-95036462022-09-24 Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using Daphnia magna Transcriptomic Profiles Choi, Tae-June An, Hyung-Eun Kim, Chang-Bae Life (Basel) Brief Report A wide range of environmental factors heavily impact aquatic ecosystems, in turn, affecting human health. Toxic organic compounds resulting from anthropogenic activity are a source of pollution in aquatic ecosystems. To evaluate these contaminants, current approaches mainly rely on acute and chronic toxicity tests, but cannot provide explicit insights into the causes of toxicity. As an alternative, genome-wide gene expression systems allow the identification of contaminants causing toxicity by monitoring the organisms’ response to toxic substances. In this study, we selected 22 toxic organic compounds, classified as pesticides, herbicides, or industrial chemicals, that induce environmental problems in aquatic ecosystems and affect human-health. To identify toxic organic compounds using gene expression data from Daphnia magna, we evaluated the performance of three machine learning based feature-ranking algorithms (Learning Vector Quantization, Random Forest, and Support Vector Machines with a Linear kernel), and nine classifiers (Linear Discriminant Analysis, Classification And Regression Trees, K-nearest neighbors, Support Vector Machines with a Linear kernel, Random Forest, Boosted C5.0, Gradient Boosting Machine, eXtreme Gradient Boosting with tree, and eXtreme Gradient Boosting with DART booster). Our analysis revealed that a combination of feature selection based on feature-ranking and a random forest classification algorithm had the best model performance, with an accuracy of 95.7%. This is a preliminary study to establish a model for the monitoring of aquatic toxic substances by machine learning. This model could be an effective tool to manage contaminants and toxic organic compounds in aquatic systems. MDPI 2022-09-16 /pmc/articles/PMC9503646/ /pubmed/36143479 http://dx.doi.org/10.3390/life12091443 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Brief Report
Choi, Tae-June
An, Hyung-Eun
Kim, Chang-Bae
Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using Daphnia magna Transcriptomic Profiles
title Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using Daphnia magna Transcriptomic Profiles
title_full Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using Daphnia magna Transcriptomic Profiles
title_fullStr Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using Daphnia magna Transcriptomic Profiles
title_full_unstemmed Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using Daphnia magna Transcriptomic Profiles
title_short Machine Learning Models for Identification and Prediction of Toxic Organic Compounds Using Daphnia magna Transcriptomic Profiles
title_sort machine learning models for identification and prediction of toxic organic compounds using daphnia magna transcriptomic profiles
topic Brief Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9503646/
https://www.ncbi.nlm.nih.gov/pubmed/36143479
http://dx.doi.org/10.3390/life12091443
work_keys_str_mv AT choitaejune machinelearningmodelsforidentificationandpredictionoftoxicorganiccompoundsusingdaphniamagnatranscriptomicprofiles
AT anhyungeun machinelearningmodelsforidentificationandpredictionoftoxicorganiccompoundsusingdaphniamagnatranscriptomicprofiles
AT kimchangbae machinelearningmodelsforidentificationandpredictionoftoxicorganiccompoundsusingdaphniamagnatranscriptomicprofiles