Cargando…
Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology
Humans are exposed to numerous compounds daily, some of which have adverse effects on health. Computational approaches for modeling toxicological data in conjunction with machine learning algorithms have gained popularity over the last few years. Machine learning approaches have been used to predict...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9099959/ https://www.ncbi.nlm.nih.gov/pubmed/35566372 http://dx.doi.org/10.3390/molecules27093021 |
_version_ | 1784706735317975040 |
---|---|
author | Moukheiber, Lama Mangione, William Moukheiber, Mira Maleki, Saeed Falls, Zackary Gao, Mingchen Samudrala, Ram |
author_facet | Moukheiber, Lama Mangione, William Moukheiber, Mira Maleki, Saeed Falls, Zackary Gao, Mingchen Samudrala, Ram |
author_sort | Moukheiber, Lama |
collection | PubMed |
description | Humans are exposed to numerous compounds daily, some of which have adverse effects on health. Computational approaches for modeling toxicological data in conjunction with machine learning algorithms have gained popularity over the last few years. Machine learning approaches have been used to predict toxicity-related biological activities using chemical structure descriptors. However, toxicity-related proteomic features have not been fully investigated. In this study, we construct a computational pipeline using machine learning models for predicting the most important protein features responsible for the toxicity of compounds taken from the Tox21 dataset that is implemented within the multiscale Computational Analysis of Novel Drug Opportunities (CANDO) therapeutic discovery platform. Tox21 is a highly imbalanced dataset consisting of twelve in vitro assays, seven from the nuclear receptor (NR) signaling pathway and five from the stress response (SR) pathway, for more than 10,000 compounds. For the machine learning model, we employed a random forest with the combination of Synthetic Minority Oversampling Technique (SMOTE) and the Edited Nearest Neighbor (ENN) method (SMOTE+ENN), which is a resampling method to balance the activity class distribution. Within the NR and SR pathways, the activity of the aryl hydrocarbon receptor (NR-AhR) and the mitochondrial membrane potential (SR-MMP) were two of the top-performing twelve toxicity endpoints with AUCROCs of 0.90 and 0.92, respectively. The top extracted features for evaluating compound toxicity were analyzed for enrichment to highlight the implicated biological pathways and proteins. We validated our enrichment results for the activity of the AhR using a thorough literature search. Our case study showed that the selected enriched pathways and proteins from our computational pipeline are not only correlated with AhR toxicity but also form a cascading upstream/downstream arrangement. Our work elucidates significant relationships between protein and compound interactions computed using CANDO and the associated biological pathways to which the proteins belong for twelve toxicity endpoints. This novel study uses machine learning not only to predict and understand toxicity but also elucidates therapeutic mechanisms at a proteomic level for a variety of toxicity endpoints. |
format | Online Article Text |
id | pubmed-9099959 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-90999592022-05-14 Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology Moukheiber, Lama Mangione, William Moukheiber, Mira Maleki, Saeed Falls, Zackary Gao, Mingchen Samudrala, Ram Molecules Article Humans are exposed to numerous compounds daily, some of which have adverse effects on health. Computational approaches for modeling toxicological data in conjunction with machine learning algorithms have gained popularity over the last few years. Machine learning approaches have been used to predict toxicity-related biological activities using chemical structure descriptors. However, toxicity-related proteomic features have not been fully investigated. In this study, we construct a computational pipeline using machine learning models for predicting the most important protein features responsible for the toxicity of compounds taken from the Tox21 dataset that is implemented within the multiscale Computational Analysis of Novel Drug Opportunities (CANDO) therapeutic discovery platform. Tox21 is a highly imbalanced dataset consisting of twelve in vitro assays, seven from the nuclear receptor (NR) signaling pathway and five from the stress response (SR) pathway, for more than 10,000 compounds. For the machine learning model, we employed a random forest with the combination of Synthetic Minority Oversampling Technique (SMOTE) and the Edited Nearest Neighbor (ENN) method (SMOTE+ENN), which is a resampling method to balance the activity class distribution. Within the NR and SR pathways, the activity of the aryl hydrocarbon receptor (NR-AhR) and the mitochondrial membrane potential (SR-MMP) were two of the top-performing twelve toxicity endpoints with AUCROCs of 0.90 and 0.92, respectively. The top extracted features for evaluating compound toxicity were analyzed for enrichment to highlight the implicated biological pathways and proteins. We validated our enrichment results for the activity of the AhR using a thorough literature search. Our case study showed that the selected enriched pathways and proteins from our computational pipeline are not only correlated with AhR toxicity but also form a cascading upstream/downstream arrangement. Our work elucidates significant relationships between protein and compound interactions computed using CANDO and the associated biological pathways to which the proteins belong for twelve toxicity endpoints. This novel study uses machine learning not only to predict and understand toxicity but also elucidates therapeutic mechanisms at a proteomic level for a variety of toxicity endpoints. MDPI 2022-05-08 /pmc/articles/PMC9099959/ /pubmed/35566372 http://dx.doi.org/10.3390/molecules27093021 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Moukheiber, Lama Mangione, William Moukheiber, Mira Maleki, Saeed Falls, Zackary Gao, Mingchen Samudrala, Ram Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology |
title | Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology |
title_full | Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology |
title_fullStr | Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology |
title_full_unstemmed | Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology |
title_short | Identifying Protein Features and Pathways Responsible for Toxicity Using Machine Learning and Tox21: Implications for Predictive Toxicology |
title_sort | identifying protein features and pathways responsible for toxicity using machine learning and tox21: implications for predictive toxicology |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9099959/ https://www.ncbi.nlm.nih.gov/pubmed/35566372 http://dx.doi.org/10.3390/molecules27093021 |
work_keys_str_mv | AT moukheiberlama identifyingproteinfeaturesandpathwaysresponsiblefortoxicityusingmachinelearningandtox21implicationsforpredictivetoxicology AT mangionewilliam identifyingproteinfeaturesandpathwaysresponsiblefortoxicityusingmachinelearningandtox21implicationsforpredictivetoxicology AT moukheibermira identifyingproteinfeaturesandpathwaysresponsiblefortoxicityusingmachinelearningandtox21implicationsforpredictivetoxicology AT malekisaeed identifyingproteinfeaturesandpathwaysresponsiblefortoxicityusingmachinelearningandtox21implicationsforpredictivetoxicology AT fallszackary identifyingproteinfeaturesandpathwaysresponsiblefortoxicityusingmachinelearningandtox21implicationsforpredictivetoxicology AT gaomingchen identifyingproteinfeaturesandpathwaysresponsiblefortoxicityusingmachinelearningandtox21implicationsforpredictivetoxicology AT samudralaram identifyingproteinfeaturesandpathwaysresponsiblefortoxicityusingmachinelearningandtox21implicationsforpredictivetoxicology |