Cargando…

Improved linking of motifs to their TFs using domain information

MOTIVATION: A central aim of molecular biology is to identify mechanisms of transcriptional regulation. Transcription factors (TFs), which are DNA-binding proteins, are highly involved in these processes, thus a crucial information is to know where TFs interact with DNA and to be aware of the TFs’ D...

Descripción completa

Detalles Bibliográficos
Autores principales:	Baumgarten, Nina, Schmidt, Florian, Schulz, Marcel H
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703792/ https://www.ncbi.nlm.nih.gov/pubmed/31742324 http://dx.doi.org/10.1093/bioinformatics/btz855

_version_	1783616697625739264
author	Baumgarten, Nina Schmidt, Florian Schulz, Marcel H
author_facet	Baumgarten, Nina Schmidt, Florian Schulz, Marcel H
author_sort	Baumgarten, Nina
collection	PubMed
description	MOTIVATION: A central aim of molecular biology is to identify mechanisms of transcriptional regulation. Transcription factors (TFs), which are DNA-binding proteins, are highly involved in these processes, thus a crucial information is to know where TFs interact with DNA and to be aware of the TFs’ DNA-binding motifs. For that reason, computational tools exist that link DNA-binding motifs to TFs either without sequence information or based on TF-associated sequences, e.g. identified via a chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiment. In this paper, we present MASSIF, a novel method to improve the performance of existing tools that link motifs to TFs relying on TF-associated sequences. MASSIF is based on the idea that a DNA-binding motif, which is correctly linked to a TF, should be assigned to a DNA-binding domain (DBD) similar to that of the mapped TF. Because DNA-binding motifs are in general not linked to DBDs, it is not possible to compare the DBD of a TF and the motif directly. Instead we created a DBD collection, which consist of TFs with a known DBD and an associated motif. This collection enables us to evaluate how likely it is that a linked motif and a TF of interest are associated to the same DBD. We named this similarity measure domain score, and represent it as a P-value. We developed two different ways to improve the performance of existing tools that link motifs to TFs based on TF-associated sequences: (i) using meta-analysis to combine P-values from one or several of these tools with the P-value of the domain score and (ii) filter unlikely motifs based on the domain score. RESULTS: We demonstrate the functionality of MASSIF on several human ChIP-seq datasets, using either motifs from the HOCOMOCO database or de novo identified ones as input motifs. In addition, we show that both variants of our method improve the performance of tools that link motifs to TFs based on TF-associated sequences significantly independent of the considered DBD type. AVAILABILITY AND IMPLEMENTATION: MASSIF is freely available online at https://github.com/SchulzLab/MASSIF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-7703792
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-77037922020-12-07 Improved linking of motifs to their TFs using domain information Baumgarten, Nina Schmidt, Florian Schulz, Marcel H Bioinformatics Review MOTIVATION: A central aim of molecular biology is to identify mechanisms of transcriptional regulation. Transcription factors (TFs), which are DNA-binding proteins, are highly involved in these processes, thus a crucial information is to know where TFs interact with DNA and to be aware of the TFs’ DNA-binding motifs. For that reason, computational tools exist that link DNA-binding motifs to TFs either without sequence information or based on TF-associated sequences, e.g. identified via a chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiment. In this paper, we present MASSIF, a novel method to improve the performance of existing tools that link motifs to TFs relying on TF-associated sequences. MASSIF is based on the idea that a DNA-binding motif, which is correctly linked to a TF, should be assigned to a DNA-binding domain (DBD) similar to that of the mapped TF. Because DNA-binding motifs are in general not linked to DBDs, it is not possible to compare the DBD of a TF and the motif directly. Instead we created a DBD collection, which consist of TFs with a known DBD and an associated motif. This collection enables us to evaluate how likely it is that a linked motif and a TF of interest are associated to the same DBD. We named this similarity measure domain score, and represent it as a P-value. We developed two different ways to improve the performance of existing tools that link motifs to TFs based on TF-associated sequences: (i) using meta-analysis to combine P-values from one or several of these tools with the P-value of the domain score and (ii) filter unlikely motifs based on the domain score. RESULTS: We demonstrate the functionality of MASSIF on several human ChIP-seq datasets, using either motifs from the HOCOMOCO database or de novo identified ones as input motifs. In addition, we show that both variants of our method improve the performance of tools that link motifs to TFs based on TF-associated sequences significantly independent of the considered DBD type. AVAILABILITY AND IMPLEMENTATION: MASSIF is freely available online at https://github.com/SchulzLab/MASSIF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-03-15 2019-11-19 /pmc/articles/PMC7703792/ /pubmed/31742324 http://dx.doi.org/10.1093/bioinformatics/btz855 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Review Baumgarten, Nina Schmidt, Florian Schulz, Marcel H Improved linking of motifs to their TFs using domain information
title	Improved linking of motifs to their TFs using domain information
title_full	Improved linking of motifs to their TFs using domain information
title_fullStr	Improved linking of motifs to their TFs using domain information
title_full_unstemmed	Improved linking of motifs to their TFs using domain information
title_short	Improved linking of motifs to their TFs using domain information
title_sort	improved linking of motifs to their tfs using domain information
topic	Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703792/ https://www.ncbi.nlm.nih.gov/pubmed/31742324 http://dx.doi.org/10.1093/bioinformatics/btz855
work_keys_str_mv	AT baumgartennina improvedlinkingofmotifstotheirtfsusingdomaininformation AT schmidtflorian improvedlinkingofmotifstotheirtfsusingdomaininformation AT schulzmarcelh improvedlinkingofmotifstotheirtfsusingdomaininformation

Improved linking of motifs to their TFs using domain information

Ejemplares similares