Cargando…

MTTFsite: cross-cell type TF binding site prediction by using multi-task learning

MOTIVATION: The prediction of transcription factor binding sites (TFBSs) is crucial for gene expression analysis. Supervised learning approaches for TFBS predictions require large amounts of labeled data. However, many TFs of certain cell types either do not have sufficient labeled data or do not ha...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Jiyun, Lu, Qin, Gui, Lin, Xu, Ruifeng, Long, Yunfei, Wang, Hongpeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954652/
https://www.ncbi.nlm.nih.gov/pubmed/31161194
http://dx.doi.org/10.1093/bioinformatics/btz451
_version_ 1783486840131551232
author Zhou, Jiyun
Lu, Qin
Gui, Lin
Xu, Ruifeng
Long, Yunfei
Wang, Hongpeng
author_facet Zhou, Jiyun
Lu, Qin
Gui, Lin
Xu, Ruifeng
Long, Yunfei
Wang, Hongpeng
author_sort Zhou, Jiyun
collection PubMed
description MOTIVATION: The prediction of transcription factor binding sites (TFBSs) is crucial for gene expression analysis. Supervised learning approaches for TFBS predictions require large amounts of labeled data. However, many TFs of certain cell types either do not have sufficient labeled data or do not have any labeled data. RESULTS: In this paper, a multi-task learning framework (called MTTFsite) is proposed to address the lack of labeled data problem by leveraging on labeled data available in cross-cell types. The proposed MTTFsite contains a shared CNN to learn common features for all cell types and a private CNN for each cell type to learn private features. The common features are aimed to help predicting TFBSs for all cell types especially those cell types that lack labeled data. MTTFsite is evaluated on 241 cell type TF pairs and compared with a baseline method without using any multi-task learning model and a fully shared multi-task model that uses only a shared CNN and do not use private CNNs. For cell types with insufficient labeled data, results show that MTTFsite performs better than the baseline method and the fully shared model on more than 89% pairs. For cell types without any labeled data, MTTFsite outperforms the baseline method and the fully shared model by more than 80 and 93% pairs, respectively. A novel gene expression prediction method (called TFChrome) using both MTTFsite and histone modification features is also presented. Results show that TFBSs predicted by MTTFsite alone can achieve good performance. When MTTFsite is combined with histone modification features, a significant 5.7% performance improvement is obtained. AVAILABILITY AND IMPLEMENTATION: The resource and executable code are freely available at http://hlt.hitsz.edu.cn/MTTFsite/ and http://www.hitsz-hlt.com:8080/MTTFsite/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6954652
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-69546522020-01-16 MTTFsite: cross-cell type TF binding site prediction by using multi-task learning Zhou, Jiyun Lu, Qin Gui, Lin Xu, Ruifeng Long, Yunfei Wang, Hongpeng Bioinformatics Original Papers MOTIVATION: The prediction of transcription factor binding sites (TFBSs) is crucial for gene expression analysis. Supervised learning approaches for TFBS predictions require large amounts of labeled data. However, many TFs of certain cell types either do not have sufficient labeled data or do not have any labeled data. RESULTS: In this paper, a multi-task learning framework (called MTTFsite) is proposed to address the lack of labeled data problem by leveraging on labeled data available in cross-cell types. The proposed MTTFsite contains a shared CNN to learn common features for all cell types and a private CNN for each cell type to learn private features. The common features are aimed to help predicting TFBSs for all cell types especially those cell types that lack labeled data. MTTFsite is evaluated on 241 cell type TF pairs and compared with a baseline method without using any multi-task learning model and a fully shared multi-task model that uses only a shared CNN and do not use private CNNs. For cell types with insufficient labeled data, results show that MTTFsite performs better than the baseline method and the fully shared model on more than 89% pairs. For cell types without any labeled data, MTTFsite outperforms the baseline method and the fully shared model by more than 80 and 93% pairs, respectively. A novel gene expression prediction method (called TFChrome) using both MTTFsite and histone modification features is also presented. Results show that TFBSs predicted by MTTFsite alone can achieve good performance. When MTTFsite is combined with histone modification features, a significant 5.7% performance improvement is obtained. AVAILABILITY AND IMPLEMENTATION: The resource and executable code are freely available at http://hlt.hitsz.edu.cn/MTTFsite/ and http://www.hitsz-hlt.com:8080/MTTFsite/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-12-15 2019-06-04 /pmc/articles/PMC6954652/ /pubmed/31161194 http://dx.doi.org/10.1093/bioinformatics/btz451 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Zhou, Jiyun
Lu, Qin
Gui, Lin
Xu, Ruifeng
Long, Yunfei
Wang, Hongpeng
MTTFsite: cross-cell type TF binding site prediction by using multi-task learning
title MTTFsite: cross-cell type TF binding site prediction by using multi-task learning
title_full MTTFsite: cross-cell type TF binding site prediction by using multi-task learning
title_fullStr MTTFsite: cross-cell type TF binding site prediction by using multi-task learning
title_full_unstemmed MTTFsite: cross-cell type TF binding site prediction by using multi-task learning
title_short MTTFsite: cross-cell type TF binding site prediction by using multi-task learning
title_sort mttfsite: cross-cell type tf binding site prediction by using multi-task learning
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954652/
https://www.ncbi.nlm.nih.gov/pubmed/31161194
http://dx.doi.org/10.1093/bioinformatics/btz451
work_keys_str_mv AT zhoujiyun mttfsitecrosscelltypetfbindingsitepredictionbyusingmultitasklearning
AT luqin mttfsitecrosscelltypetfbindingsitepredictionbyusingmultitasklearning
AT guilin mttfsitecrosscelltypetfbindingsitepredictionbyusingmultitasklearning
AT xuruifeng mttfsitecrosscelltypetfbindingsitepredictionbyusingmultitasklearning
AT longyunfei mttfsitecrosscelltypetfbindingsitepredictionbyusingmultitasklearning
AT wanghongpeng mttfsitecrosscelltypetfbindingsitepredictionbyusingmultitasklearning