Cargando…
TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization
Four statistical selection methods for inferring transcription factor (TF)–target gene (TG) pairs were developed by coupling mean squared error (MSE) or Huber loss function, with elastic net (ENET) or least absolute shrinkage and selection operator (Lasso) penalty. Two methods were also developed fo...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10498345/ https://www.ncbi.nlm.nih.gov/pubmed/37711605 http://dx.doi.org/10.1093/nargab/lqad083 |
_version_ | 1785105501455908864 |
---|---|
author | Cao, Xuewei Zhang, Ling Islam, Md Khairul Zhao, Mingxia He, Cheng Zhang, Kui Liu, Sanzhen Sha, Qiuying Wei, Hairong |
author_facet | Cao, Xuewei Zhang, Ling Islam, Md Khairul Zhao, Mingxia He, Cheng Zhang, Kui Liu, Sanzhen Sha, Qiuying Wei, Hairong |
author_sort | Cao, Xuewei |
collection | PubMed |
description | Four statistical selection methods for inferring transcription factor (TF)–target gene (TG) pairs were developed by coupling mean squared error (MSE) or Huber loss function, with elastic net (ENET) or least absolute shrinkage and selection operator (Lasso) penalty. Two methods were also developed for inferring pathway gene regulatory networks (GRNs) by combining Huber or MSE loss function with a network (Net)-based penalty. To solve these regressions, we ameliorated an accelerated proximal gradient descent (APGD) algorithm to optimize parameter selection processes, resulting in an equally effective but much faster algorithm than the commonly used convex optimization solver. The synthetic data generated in a general setting was used to test four TF–TG identification methods, ENET-based methods performed better than Lasso-based methods. Synthetic data generated from two network settings was used to test Huber-Net and MSE-Net, which outperformed all other methods. The TF–TG identification methods were also tested with SND1 and gl3 overexpression transcriptomic data, Huber-ENET and MSE-ENET outperformed all other methods when genome-wide predictions were performed. The TF–TG identification methods fill the gap of lacking a method for genome-wide TG prediction of a TF, and potential for validating ChIP/DAP-seq results, while the two Net-based methods are instrumental for predicting pathway GRNs. |
format | Online Article Text |
id | pubmed-10498345 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-104983452023-09-14 TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization Cao, Xuewei Zhang, Ling Islam, Md Khairul Zhao, Mingxia He, Cheng Zhang, Kui Liu, Sanzhen Sha, Qiuying Wei, Hairong NAR Genom Bioinform Methods Article Four statistical selection methods for inferring transcription factor (TF)–target gene (TG) pairs were developed by coupling mean squared error (MSE) or Huber loss function, with elastic net (ENET) or least absolute shrinkage and selection operator (Lasso) penalty. Two methods were also developed for inferring pathway gene regulatory networks (GRNs) by combining Huber or MSE loss function with a network (Net)-based penalty. To solve these regressions, we ameliorated an accelerated proximal gradient descent (APGD) algorithm to optimize parameter selection processes, resulting in an equally effective but much faster algorithm than the commonly used convex optimization solver. The synthetic data generated in a general setting was used to test four TF–TG identification methods, ENET-based methods performed better than Lasso-based methods. Synthetic data generated from two network settings was used to test Huber-Net and MSE-Net, which outperformed all other methods. The TF–TG identification methods were also tested with SND1 and gl3 overexpression transcriptomic data, Huber-ENET and MSE-ENET outperformed all other methods when genome-wide predictions were performed. The TF–TG identification methods fill the gap of lacking a method for genome-wide TG prediction of a TF, and potential for validating ChIP/DAP-seq results, while the two Net-based methods are instrumental for predicting pathway GRNs. Oxford University Press 2023-09-13 /pmc/articles/PMC10498345/ /pubmed/37711605 http://dx.doi.org/10.1093/nargab/lqad083 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Article Cao, Xuewei Zhang, Ling Islam, Md Khairul Zhao, Mingxia He, Cheng Zhang, Kui Liu, Sanzhen Sha, Qiuying Wei, Hairong TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization |
title | TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization |
title_full | TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization |
title_fullStr | TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization |
title_full_unstemmed | TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization |
title_short | TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization |
title_sort | tgpred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization |
topic | Methods Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10498345/ https://www.ncbi.nlm.nih.gov/pubmed/37711605 http://dx.doi.org/10.1093/nargab/lqad083 |
work_keys_str_mv | AT caoxuewei tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization AT zhangling tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization AT islammdkhairul tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization AT zhaomingxia tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization AT hecheng tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization AT zhangkui tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization AT liusanzhen tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization AT shaqiuying tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization AT weihairong tgpredefficientmethodsforpredictingtargetgenesofatranscriptionfactorbyintegratingstatisticsmachinelearningandoptimization |