Cargando…

NetTIME: a multitask and base-pair resolution framework for improved transcription factor binding site prediction

MOTIVATION: Machine learning models for predicting cell-type-specific transcription factor (TF) binding sites have become increasingly more accurate thanks to the increased availability of next-generation sequencing data and more standardized model evaluation criteria. However, knowledge transfer fr...

Descripción completa

Detalles Bibliográficos
Autores principales: Yi, Ren, Cho, Kyunghyun, Bonneau, Richard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563695/
https://www.ncbi.nlm.nih.gov/pubmed/35997560
http://dx.doi.org/10.1093/bioinformatics/btac569
_version_ 1784808465233870848
author Yi, Ren
Cho, Kyunghyun
Bonneau, Richard
author_facet Yi, Ren
Cho, Kyunghyun
Bonneau, Richard
author_sort Yi, Ren
collection PubMed
description MOTIVATION: Machine learning models for predicting cell-type-specific transcription factor (TF) binding sites have become increasingly more accurate thanks to the increased availability of next-generation sequencing data and more standardized model evaluation criteria. However, knowledge transfer from data-rich to data-limited TFs and cell types remains crucial for improving TF binding prediction models because available binding labels are highly skewed towards a small collection of TFs and cell types. Transfer prediction of TF binding sites can potentially benefit from a multitask learning approach; however, existing methods typically use shallow single-task models to generate low-resolution predictions. Here, we propose NetTIME, a multitask learning framework for predicting cell-type-specific TF binding sites with base-pair resolution. RESULTS: We show that the multitask learning strategy for TF binding prediction is more efficient than the single-task approach due to the increased data availability. NetTIME trains high-dimensional embedding vectors to distinguish TF and cell-type identities. We show that this approach is critical for the success of the multitask learning strategy and allows our model to make accurate transfer predictions within and beyond the training panels of TFs and cell types. We additionally train a linear-chain conditional random field (CRF) to classify binding predictions and show that this CRF eliminates the need for setting a probability threshold and reduces classification noise. We compare our method’s predictive performance with two state-of-the-art methods, Catchitt and Leopard, and show that our method outperforms previous methods under both supervised and transfer learning settings. AVAILABILITY AND IMPLEMENTATION: NetTIME is freely available at https://github.com/ryi06/NetTIME and the code is also archived at https://doi.org/10.5281/zenodo.6994897. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9563695
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-95636952022-10-18 NetTIME: a multitask and base-pair resolution framework for improved transcription factor binding site prediction Yi, Ren Cho, Kyunghyun Bonneau, Richard Bioinformatics Original Papers MOTIVATION: Machine learning models for predicting cell-type-specific transcription factor (TF) binding sites have become increasingly more accurate thanks to the increased availability of next-generation sequencing data and more standardized model evaluation criteria. However, knowledge transfer from data-rich to data-limited TFs and cell types remains crucial for improving TF binding prediction models because available binding labels are highly skewed towards a small collection of TFs and cell types. Transfer prediction of TF binding sites can potentially benefit from a multitask learning approach; however, existing methods typically use shallow single-task models to generate low-resolution predictions. Here, we propose NetTIME, a multitask learning framework for predicting cell-type-specific TF binding sites with base-pair resolution. RESULTS: We show that the multitask learning strategy for TF binding prediction is more efficient than the single-task approach due to the increased data availability. NetTIME trains high-dimensional embedding vectors to distinguish TF and cell-type identities. We show that this approach is critical for the success of the multitask learning strategy and allows our model to make accurate transfer predictions within and beyond the training panels of TFs and cell types. We additionally train a linear-chain conditional random field (CRF) to classify binding predictions and show that this CRF eliminates the need for setting a probability threshold and reduces classification noise. We compare our method’s predictive performance with two state-of-the-art methods, Catchitt and Leopard, and show that our method outperforms previous methods under both supervised and transfer learning settings. AVAILABILITY AND IMPLEMENTATION: NetTIME is freely available at https://github.com/ryi06/NetTIME and the code is also archived at https://doi.org/10.5281/zenodo.6994897. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-08-23 /pmc/articles/PMC9563695/ /pubmed/35997560 http://dx.doi.org/10.1093/bioinformatics/btac569 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Yi, Ren
Cho, Kyunghyun
Bonneau, Richard
NetTIME: a multitask and base-pair resolution framework for improved transcription factor binding site prediction
title NetTIME: a multitask and base-pair resolution framework for improved transcription factor binding site prediction
title_full NetTIME: a multitask and base-pair resolution framework for improved transcription factor binding site prediction
title_fullStr NetTIME: a multitask and base-pair resolution framework for improved transcription factor binding site prediction
title_full_unstemmed NetTIME: a multitask and base-pair resolution framework for improved transcription factor binding site prediction
title_short NetTIME: a multitask and base-pair resolution framework for improved transcription factor binding site prediction
title_sort nettime: a multitask and base-pair resolution framework for improved transcription factor binding site prediction
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563695/
https://www.ncbi.nlm.nih.gov/pubmed/35997560
http://dx.doi.org/10.1093/bioinformatics/btac569
work_keys_str_mv AT yiren nettimeamultitaskandbasepairresolutionframeworkforimprovedtranscriptionfactorbindingsiteprediction
AT chokyunghyun nettimeamultitaskandbasepairresolutionframeworkforimprovedtranscriptionfactorbindingsiteprediction
AT bonneaurichard nettimeamultitaskandbasepairresolutionframeworkforimprovedtranscriptionfactorbindingsiteprediction