Cargando…

Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology

Humans are exposed to numerous compounds daily, some of which have adverse effects on health. Computational approaches for modeling toxicological data in conjunction with machine learning algorithms have gained popularity over the last few years. Machine learning approaches have been used to predict...

Descripción completa

Detalles Bibliográficos
Autores principales: Moukheiber, Lama, Mangione, William, Moukheiber, Mira, Maleki, Saeed, Falls, Zackary, Gao, Mingchen, Samudrala, Ram
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9099959/
https://www.ncbi.nlm.nih.gov/pubmed/35566372
http://dx.doi.org/10.3390/molecules27093021
_version_ 1784706735317975040
author Moukheiber, Lama
Mangione, William
Moukheiber, Mira
Maleki, Saeed
Falls, Zackary
Gao, Mingchen
Samudrala, Ram
author_facet Moukheiber, Lama
Mangione, William
Moukheiber, Mira
Maleki, Saeed
Falls, Zackary
Gao, Mingchen
Samudrala, Ram
author_sort Moukheiber, Lama
collection PubMed
description Humans are exposed to numerous compounds daily, some of which have adverse effects on health. Computational approaches for modeling toxicological data in conjunction with machine learning algorithms have gained popularity over the last few years. Machine learning approaches have been used to predict toxicity-related biological activities using chemical structure descriptors. However, toxicity-related proteomic features have not been fully investigated. In this study, we construct a computational pipeline using machine learning models for predicting the most important protein features responsible for the toxicity of compounds taken from the Tox21 dataset that is implemented within the multiscale Computational Analysis of Novel Drug Opportunities (CANDO) therapeutic discovery platform. Tox21 is a highly imbalanced dataset consisting of twelve in vitro assays, seven from the nuclear receptor (NR) signaling pathway and five from the stress response (SR) pathway, for more than 10,000 compounds. For the machine learning model, we employed a random forest with the combination of Synthetic Minority Oversampling Technique (SMOTE) and the Edited Nearest Neighbor (ENN) method (SMOTE+ENN), which is a resampling method to balance the activity class distribution. Within the NR and SR pathways, the activity of the aryl hydrocarbon receptor (NR-AhR) and the mitochondrial membrane potential (SR-MMP) were two of the top-performing twelve toxicity endpoints with AUCROCs of 0.90 and 0.92, respectively. The top extracted features for evaluating compound toxicity were analyzed for enrichment to highlight the implicated biological pathways and proteins. We validated our enrichment results for the activity of the AhR using a thorough literature search. Our case study showed that the selected enriched pathways and proteins from our computational pipeline are not only correlated with AhR toxicity but also form a cascading upstream/downstream arrangement. Our work elucidates significant relationships between protein and compound interactions computed using CANDO and the associated biological pathways to which the proteins belong for twelve toxicity endpoints. This novel study uses machine learning not only to predict and understand toxicity but also elucidates therapeutic mechanisms at a proteomic level for a variety of toxicity endpoints.
format Online
Article
Text
id pubmed-9099959
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-90999592022-05-14 Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology Moukheiber, Lama Mangione, William Moukheiber, Mira Maleki, Saeed Falls, Zackary Gao, Mingchen Samudrala, Ram Molecules Article Humans are exposed to numerous compounds daily, some of which have adverse effects on health. Computational approaches for modeling toxicological data in conjunction with machine learning algorithms have gained popularity over the last few years. Machine learning approaches have been used to predict toxicity-related biological activities using chemical structure descriptors. However, toxicity-related proteomic features have not been fully investigated. In this study, we construct a computational pipeline using machine learning models for predicting the most important protein features responsible for the toxicity of compounds taken from the Tox21 dataset that is implemented within the multiscale Computational Analysis of Novel Drug Opportunities (CANDO) therapeutic discovery platform. Tox21 is a highly imbalanced dataset consisting of twelve in vitro assays, seven from the nuclear receptor (NR) signaling pathway and five from the stress response (SR) pathway, for more than 10,000 compounds. For the machine learning model, we employed a random forest with the combination of Synthetic Minority Oversampling Technique (SMOTE) and the Edited Nearest Neighbor (ENN) method (SMOTE+ENN), which is a resampling method to balance the activity class distribution. Within the NR and SR pathways, the activity of the aryl hydrocarbon receptor (NR-AhR) and the mitochondrial membrane potential (SR-MMP) were two of the top-performing twelve toxicity endpoints with AUCROCs of 0.90 and 0.92, respectively. The top extracted features for evaluating compound toxicity were analyzed for enrichment to highlight the implicated biological pathways and proteins. We validated our enrichment results for the activity of the AhR using a thorough literature search. Our case study showed that the selected enriched pathways and proteins from our computational pipeline are not only correlated with AhR toxicity but also form a cascading upstream/downstream arrangement. Our work elucidates significant relationships between protein and compound interactions computed using CANDO and the associated biological pathways to which the proteins belong for twelve toxicity endpoints. This novel study uses machine learning not only to predict and understand toxicity but also elucidates therapeutic mechanisms at a proteomic level for a variety of toxicity endpoints. MDPI 2022-05-08 /pmc/articles/PMC9099959/ /pubmed/35566372 http://dx.doi.org/10.3390/molecules27093021 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Moukheiber, Lama
Mangione, William
Moukheiber, Mira
Maleki, Saeed
Falls, Zackary
Gao, Mingchen
Samudrala, Ram
Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology
title Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology
title_full Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology
title_fullStr Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology
title_full_unstemmed Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology
title_short Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology
title_sort identifying protein features and pathways responsible for toxicity using machine learning and tox21: implications for predictive toxicology
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9099959/
https://www.ncbi.nlm.nih.gov/pubmed/35566372
http://dx.doi.org/10.3390/molecules27093021
work_keys_str_mv AT moukheiberlama identifyingproteinfeaturesandpathwaysresponsiblefortoxicityusingmachinelearningandtox21implicationsforpredictivetoxicology
AT mangionewilliam identifyingproteinfeaturesandpathwaysresponsiblefortoxicityusingmachinelearningandtox21implicationsforpredictivetoxicology
AT moukheibermira identifyingproteinfeaturesandpathwaysresponsiblefortoxicityusingmachinelearningandtox21implicationsforpredictivetoxicology
AT malekisaeed identifyingproteinfeaturesandpathwaysresponsiblefortoxicityusingmachinelearningandtox21implicationsforpredictivetoxicology
AT fallszackary identifyingproteinfeaturesandpathwaysresponsiblefortoxicityusingmachinelearningandtox21implicationsforpredictivetoxicology
AT gaomingchen identifyingproteinfeaturesandpathwaysresponsiblefortoxicityusingmachinelearningandtox21implicationsforpredictivetoxicology
AT samudralaram identifyingproteinfeaturesandpathwaysresponsiblefortoxicityusingmachinelearningandtox21implicationsforpredictivetoxicology