Cargando…

Predicting chemical ecotoxicity by learning latent space chemical representations

In silico prediction of chemical ecotoxicity (HC(50)) represents an important complement to improve in vivo and in vitro toxicological assessment of manufactured chemicals. Recent application of machine learning models to predict chemical HC(50) yields variable prediction performance that depends on...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gao, Feng, Zhang, Wei, Baccarelli, Andrea A., Shen, Yike
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9044254/ https://www.ncbi.nlm.nih.gov/pubmed/35395577 http://dx.doi.org/10.1016/j.envint.2022.107224

_version_	1784695065719865344
author	Gao, Feng Zhang, Wei Baccarelli, Andrea A. Shen, Yike
author_facet	Gao, Feng Zhang, Wei Baccarelli, Andrea A. Shen, Yike
author_sort	Gao, Feng
collection	PubMed
description	In silico prediction of chemical ecotoxicity (HC(50)) represents an important complement to improve in vivo and in vitro toxicological assessment of manufactured chemicals. Recent application of machine learning models to predict chemical HC(50) yields variable prediction performance that depends on effectively learning chemical representations from high-dimension data. To improve HC(50) prediction performance, we developed an autoencoder model by learning latent space chemical embeddings. This novel approach achieved state-of-the-art prediction performance of HC(50) with R(2) of 0.668 ± 0.003 and mean absolute error (MAE) of 0.572 ± 0.001, and outperformed other dimension reduction methods including principal component analysis (PCA) (R(2) = 0.601 ± 0.031 and MAE = 0.629 ± 0.005), kernel PCA (R(2) = 0.631 ± 0.008 and MAE = 0.625 ± 0.006), and uniform manifold approximation and projection dimensionality reduction (R(2) = 0.400 ± 0.008 and MAE = 0.801 ± 0.002). A simple linear layer with chemical embeddings learned from the autoencoder model performed better than random forest (R(2) = 0.663 ± 0.007 and MAE = 0.591 ± 0.008), fully connected neural network (R(2) = 0.614 ± 0.016 and MAE = 0.610 ± 0.008), least absolute shrinkage and selection operator (R(2) = 0.617 ± 0.037 and MAE = 0.619 ± 0.007), and ridge regression (R(2) = 0.638 ± 0.007 and MAE = 0.613 ± 0.005) using unlearned raw input features. Our results highlighted the usefulness of learning latent chemical representations, and our autoencoder model provides an alternative approach for robust HC(50) prediction.
format	Online Article Text
id	pubmed-9044254
institution	National Center for Biotechnology Information
language	English
publishDate	2022
record_format	MEDLINE/PubMed
spelling	pubmed-90442542022-05-01 Predicting chemical ecotoxicity by learning latent space chemical representations Gao, Feng Zhang, Wei Baccarelli, Andrea A. Shen, Yike Environ Int Article In silico prediction of chemical ecotoxicity (HC(50)) represents an important complement to improve in vivo and in vitro toxicological assessment of manufactured chemicals. Recent application of machine learning models to predict chemical HC(50) yields variable prediction performance that depends on effectively learning chemical representations from high-dimension data. To improve HC(50) prediction performance, we developed an autoencoder model by learning latent space chemical embeddings. This novel approach achieved state-of-the-art prediction performance of HC(50) with R(2) of 0.668 ± 0.003 and mean absolute error (MAE) of 0.572 ± 0.001, and outperformed other dimension reduction methods including principal component analysis (PCA) (R(2) = 0.601 ± 0.031 and MAE = 0.629 ± 0.005), kernel PCA (R(2) = 0.631 ± 0.008 and MAE = 0.625 ± 0.006), and uniform manifold approximation and projection dimensionality reduction (R(2) = 0.400 ± 0.008 and MAE = 0.801 ± 0.002). A simple linear layer with chemical embeddings learned from the autoencoder model performed better than random forest (R(2) = 0.663 ± 0.007 and MAE = 0.591 ± 0.008), fully connected neural network (R(2) = 0.614 ± 0.016 and MAE = 0.610 ± 0.008), least absolute shrinkage and selection operator (R(2) = 0.617 ± 0.037 and MAE = 0.619 ± 0.007), and ridge regression (R(2) = 0.638 ± 0.007 and MAE = 0.613 ± 0.005) using unlearned raw input features. Our results highlighted the usefulness of learning latent chemical representations, and our autoencoder model provides an alternative approach for robust HC(50) prediction. 2022-05 2022-04-01 /pmc/articles/PMC9044254/ /pubmed/35395577 http://dx.doi.org/10.1016/j.envint.2022.107224 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ).
spellingShingle	Article Gao, Feng Zhang, Wei Baccarelli, Andrea A. Shen, Yike Predicting chemical ecotoxicity by learning latent space chemical representations
title	Predicting chemical ecotoxicity by learning latent space chemical representations
title_full	Predicting chemical ecotoxicity by learning latent space chemical representations
title_fullStr	Predicting chemical ecotoxicity by learning latent space chemical representations
title_full_unstemmed	Predicting chemical ecotoxicity by learning latent space chemical representations
title_short	Predicting chemical ecotoxicity by learning latent space chemical representations
title_sort	predicting chemical ecotoxicity by learning latent space chemical representations
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9044254/ https://www.ncbi.nlm.nih.gov/pubmed/35395577 http://dx.doi.org/10.1016/j.envint.2022.107224
work_keys_str_mv	AT gaofeng predictingchemicalecotoxicitybylearninglatentspacechemicalrepresentations AT zhangwei predictingchemicalecotoxicitybylearninglatentspacechemicalrepresentations AT baccarelliandreaa predictingchemicalecotoxicitybylearninglatentspacechemicalrepresentations AT shenyike predictingchemicalecotoxicitybylearninglatentspacechemicalrepresentations

Predicting chemical ecotoxicity by learning latent space chemical representations

Ejemplares similares