Cargando…

A feature transferring workflow between data-poor compounds in various tasks

Compound screening by in silico approaches has advantages in identifying high-activity leading compounds and can predict the safety of the drug. A key challenge is that the number of observations of drug activity and toxicity accumulation varies by target in different datasets, some of which are mor...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sun, Xiaofei, Zhu, Jingyuan, Chen, Bin, You, Hengzhi, Xu, Huiqing
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8967016/ https://www.ncbi.nlm.nih.gov/pubmed/35353844 http://dx.doi.org/10.1371/journal.pone.0266088

_version_	1784678747940585472
author	Sun, Xiaofei Zhu, Jingyuan Chen, Bin You, Hengzhi Xu, Huiqing
author_facet	Sun, Xiaofei Zhu, Jingyuan Chen, Bin You, Hengzhi Xu, Huiqing
author_sort	Sun, Xiaofei
collection	PubMed
description	Compound screening by in silico approaches has advantages in identifying high-activity leading compounds and can predict the safety of the drug. A key challenge is that the number of observations of drug activity and toxicity accumulation varies by target in different datasets, some of which are more understudied than others. Owing to an overall insufficiency and imbalance of drug data, it is hard to accurately predict drug activity and toxicity of multiple tasks by the existing models. To solve this problem, this paper proposed a two-stage transfer learning workflow to develop a novel prediction model, which can accurately predict drug activity and toxicity of the targets with insufficient observations. We built a balanced dataset based on the Tox21 dataset and developed a drug activity and toxicity prediction model based on Siamese networks and graph convolution to produce multitasking output. We also took advantage of transfer learning from data-rich targets to data-poor targets. We showed greater accuracy in predicting the activity and toxicity of compounds to targets with rich data and poor data. In Tox21, a relatively rich dataset, the prediction model accuracy for classification tasks was 0.877 AUROC. In the other five unbalanced datasets, we also found that transfer learning strategies brought the accuracy of models to a higher level in understudied targets. Our models can overcome the imbalance in target data and predict the compound activity and toxicity of understudied targets to help prioritize upcoming biological experiments.
format	Online Article Text
id	pubmed-8967016
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-89670162022-03-31 A feature transferring workflow between data-poor compounds in various tasks Sun, Xiaofei Zhu, Jingyuan Chen, Bin You, Hengzhi Xu, Huiqing PLoS One Research Article Compound screening by in silico approaches has advantages in identifying high-activity leading compounds and can predict the safety of the drug. A key challenge is that the number of observations of drug activity and toxicity accumulation varies by target in different datasets, some of which are more understudied than others. Owing to an overall insufficiency and imbalance of drug data, it is hard to accurately predict drug activity and toxicity of multiple tasks by the existing models. To solve this problem, this paper proposed a two-stage transfer learning workflow to develop a novel prediction model, which can accurately predict drug activity and toxicity of the targets with insufficient observations. We built a balanced dataset based on the Tox21 dataset and developed a drug activity and toxicity prediction model based on Siamese networks and graph convolution to produce multitasking output. We also took advantage of transfer learning from data-rich targets to data-poor targets. We showed greater accuracy in predicting the activity and toxicity of compounds to targets with rich data and poor data. In Tox21, a relatively rich dataset, the prediction model accuracy for classification tasks was 0.877 AUROC. In the other five unbalanced datasets, we also found that transfer learning strategies brought the accuracy of models to a higher level in understudied targets. Our models can overcome the imbalance in target data and predict the compound activity and toxicity of understudied targets to help prioritize upcoming biological experiments. Public Library of Science 2022-03-30 /pmc/articles/PMC8967016/ /pubmed/35353844 http://dx.doi.org/10.1371/journal.pone.0266088 Text en © 2022 Sun et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Sun, Xiaofei Zhu, Jingyuan Chen, Bin You, Hengzhi Xu, Huiqing A feature transferring workflow between data-poor compounds in various tasks
title	A feature transferring workflow between data-poor compounds in various tasks
title_full	A feature transferring workflow between data-poor compounds in various tasks
title_fullStr	A feature transferring workflow between data-poor compounds in various tasks
title_full_unstemmed	A feature transferring workflow between data-poor compounds in various tasks
title_short	A feature transferring workflow between data-poor compounds in various tasks
title_sort	feature transferring workflow between data-poor compounds in various tasks
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8967016/ https://www.ncbi.nlm.nih.gov/pubmed/35353844 http://dx.doi.org/10.1371/journal.pone.0266088
work_keys_str_mv	AT sunxiaofei afeaturetransferringworkflowbetweendatapoorcompoundsinvarioustasks AT zhujingyuan afeaturetransferringworkflowbetweendatapoorcompoundsinvarioustasks AT chenbin afeaturetransferringworkflowbetweendatapoorcompoundsinvarioustasks AT youhengzhi afeaturetransferringworkflowbetweendatapoorcompoundsinvarioustasks AT xuhuiqing afeaturetransferringworkflowbetweendatapoorcompoundsinvarioustasks AT sunxiaofei featuretransferringworkflowbetweendatapoorcompoundsinvarioustasks AT zhujingyuan featuretransferringworkflowbetweendatapoorcompoundsinvarioustasks AT chenbin featuretransferringworkflowbetweendatapoorcompoundsinvarioustasks AT youhengzhi featuretransferringworkflowbetweendatapoorcompoundsinvarioustasks AT xuhuiqing featuretransferringworkflowbetweendatapoorcompoundsinvarioustasks

A feature transferring workflow between data-poor compounds in various tasks

Ejemplares similares