Cargando…

TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization

Four statistical selection methods for inferring transcription factor (TF)–target gene (TG) pairs were developed by coupling mean squared error (MSE) or Huber loss function, with elastic net (ENET) or least absolute shrinkage and selection operator (Lasso) penalty. Two methods were also developed fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Cao, Xuewei, Zhang, Ling, Islam, Md Khairul, Zhao, Mingxia, He, Cheng, Zhang, Kui, Liu, Sanzhen, Sha, Qiuying, Wei, Hairong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10498345/
https://www.ncbi.nlm.nih.gov/pubmed/37711605
http://dx.doi.org/10.1093/nargab/lqad083
_version_ 1785105501455908864
author Cao, Xuewei
Zhang, Ling
Islam, Md Khairul
Zhao, Mingxia
He, Cheng
Zhang, Kui
Liu, Sanzhen
Sha, Qiuying
Wei, Hairong
author_facet Cao, Xuewei
Zhang, Ling
Islam, Md Khairul
Zhao, Mingxia
He, Cheng
Zhang, Kui
Liu, Sanzhen
Sha, Qiuying
Wei, Hairong
author_sort Cao, Xuewei
collection PubMed
description Four statistical selection methods for inferring transcription factor (TF)–target gene (TG) pairs were developed by coupling mean squared error (MSE) or Huber loss function, with elastic net (ENET) or least absolute shrinkage and selection operator (Lasso) penalty. Two methods were also developed for inferring pathway gene regulatory networks (GRNs) by combining Huber or MSE loss function with a network (Net)-based penalty. To solve these regressions, we ameliorated an accelerated proximal gradient descent (APGD) algorithm to optimize parameter selection processes, resulting in an equally effective but much faster algorithm than the commonly used convex optimization solver. The synthetic data generated in a general setting was used to test four TF–TG identification methods, ENET-based methods performed better than Lasso-based methods. Synthetic data generated from two network settings was used to test Huber-Net and MSE-Net, which outperformed all other methods. The TF–TG identification methods were also tested with SND1 and gl3 overexpression transcriptomic data, Huber-ENET and MSE-ENET outperformed all other methods when genome-wide predictions were performed. The TF–TG identification methods fill the gap of lacking a method for genome-wide TG prediction of a TF, and potential for validating ChIP/DAP-seq results, while the two Net-based methods are instrumental for predicting pathway GRNs.
format Online
Article
Text
id pubmed-10498345
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104983452023-09-14 TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization Cao, Xuewei Zhang, Ling Islam, Md Khairul Zhao, Mingxia He, Cheng Zhang, Kui Liu, Sanzhen Sha, Qiuying Wei, Hairong NAR Genom Bioinform Methods Article Four statistical selection methods for inferring transcription factor (TF)–target gene (TG) pairs were developed by coupling mean squared error (MSE) or Huber loss function, with elastic net (ENET) or least absolute shrinkage and selection operator (Lasso) penalty. Two methods were also developed for inferring pathway gene regulatory networks (GRNs) by combining Huber or MSE loss function with a network (Net)-based penalty. To solve these regressions, we ameliorated an accelerated proximal gradient descent (APGD) algorithm to optimize parameter selection processes, resulting in an equally effective but much faster algorithm than the commonly used convex optimization solver. The synthetic data generated in a general setting was used to test four TF–TG identification methods, ENET-based methods performed better than Lasso-based methods. Synthetic data generated from two network settings was used to test Huber-Net and MSE-Net, which outperformed all other methods. The TF–TG identification methods were also tested with SND1 and gl3 overexpression transcriptomic data, Huber-ENET and MSE-ENET outperformed all other methods when genome-wide predictions were performed. The TF–TG identification methods fill the gap of lacking a method for genome-wide TG prediction of a TF, and potential for validating ChIP/DAP-seq results, while the two Net-based methods are instrumental for predicting pathway GRNs. Oxford University Press 2023-09-13 /pmc/articles/PMC10498345/ /pubmed/37711605 http://dx.doi.org/10.1093/nargab/lqad083 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Article
Cao, Xuewei
Zhang, Ling
Islam, Md Khairul
Zhao, Mingxia
He, Cheng
Zhang, Kui
Liu, Sanzhen
Sha, Qiuying
Wei, Hairong
TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization
title TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization
title_full TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization
title_fullStr TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization
title_full_unstemmed TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization
title_short TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization
title_sort tgpred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10498345/
https://www.ncbi.nlm.nih.gov/pubmed/37711605
http://dx.doi.org/10.1093/nargab/lqad083
work_keys_str_mv AT caoxuewei tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization
AT zhangling tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization
AT islammdkhairul tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization
AT zhaomingxia tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization
AT hecheng tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization
AT zhangkui tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization
AT liusanzhen tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization
AT shaqiuying tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization
AT weihairong tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization