Cargando…

Jaccard index based similarity measure to compare transcription factor binding site models

BACKGROUND: Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model. TF binding DNA fragments obtained by diff...

Descripción completa

Detalles Bibliográficos
Autores principales: Vorontsov, Ilya E, Kulakovskiy, Ivan V, Makeev, Vsevolod J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3851813/
https://www.ncbi.nlm.nih.gov/pubmed/24074225
http://dx.doi.org/10.1186/1748-7188-8-23
_version_ 1782294356814725120
author Vorontsov, Ilya E
Kulakovskiy, Ivan V
Makeev, Vsevolod J
author_facet Vorontsov, Ilya E
Kulakovskiy, Ivan V
Makeev, Vsevolod J
author_sort Vorontsov, Ilya E
collection PubMed
description BACKGROUND: Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model. TF binding DNA fragments obtained by different experimental methods usually give similar but not identical PWMs. This is also common for different TFs from the same structural family. Thus it is often necessary to measure the similarity between PWMs. The popular tools compare PWMs directly using matrix elements. Yet, for log-odds PWMs, negative elements do not contribute to the scores of highly scoring TFBS and thus may be different without affecting the sets of the best recognized binding sites. Moreover, the two TFBS sets recognized by a given pair of PWMs can be more or less different depending on the score thresholds. RESULTS: We propose a practical approach for comparing two TFBS models, each consisting of a PWM and the respective scoring threshold. The proposed measure is a variant of the Jaccard index between two TFBS sets. The measure defines a metric space for TFBS models of all finite lengths. The algorithm can compare TFBS models constructed using substantially different approaches, like PWMs with raw positional counts and log-odds. We present the efficient software implementation: MACRO-APE (MAtrix CompaRisOn by Approximate P-value Estimation). CONCLUSIONS: MACRO-APE can be effectively used to compute the Jaccard index based similarity for two TFBS models. A two-pass scanning algorithm is presented to scan a given collection of PWMs for PWMs similar to a given query. AVAILABILITY AND IMPLEMENTATION: MACRO-APE is implemented in ruby 1.9; software including source code and a manual is freely available at http://autosome.ru/macroape/ and in supplementary materials.
format Online
Article
Text
id pubmed-3851813
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38518132013-12-20 Jaccard index based similarity measure to compare transcription factor binding site models Vorontsov, Ilya E Kulakovskiy, Ivan V Makeev, Vsevolod J Algorithms Mol Biol Software Article BACKGROUND: Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model. TF binding DNA fragments obtained by different experimental methods usually give similar but not identical PWMs. This is also common for different TFs from the same structural family. Thus it is often necessary to measure the similarity between PWMs. The popular tools compare PWMs directly using matrix elements. Yet, for log-odds PWMs, negative elements do not contribute to the scores of highly scoring TFBS and thus may be different without affecting the sets of the best recognized binding sites. Moreover, the two TFBS sets recognized by a given pair of PWMs can be more or less different depending on the score thresholds. RESULTS: We propose a practical approach for comparing two TFBS models, each consisting of a PWM and the respective scoring threshold. The proposed measure is a variant of the Jaccard index between two TFBS sets. The measure defines a metric space for TFBS models of all finite lengths. The algorithm can compare TFBS models constructed using substantially different approaches, like PWMs with raw positional counts and log-odds. We present the efficient software implementation: MACRO-APE (MAtrix CompaRisOn by Approximate P-value Estimation). CONCLUSIONS: MACRO-APE can be effectively used to compute the Jaccard index based similarity for two TFBS models. A two-pass scanning algorithm is presented to scan a given collection of PWMs for PWMs similar to a given query. AVAILABILITY AND IMPLEMENTATION: MACRO-APE is implemented in ruby 1.9; software including source code and a manual is freely available at http://autosome.ru/macroape/ and in supplementary materials. BioMed Central 2013-09-30 /pmc/articles/PMC3851813/ /pubmed/24074225 http://dx.doi.org/10.1186/1748-7188-8-23 Text en Copyright © 2013 Vorontsov et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Article
Vorontsov, Ilya E
Kulakovskiy, Ivan V
Makeev, Vsevolod J
Jaccard index based similarity measure to compare transcription factor binding site models
title Jaccard index based similarity measure to compare transcription factor binding site models
title_full Jaccard index based similarity measure to compare transcription factor binding site models
title_fullStr Jaccard index based similarity measure to compare transcription factor binding site models
title_full_unstemmed Jaccard index based similarity measure to compare transcription factor binding site models
title_short Jaccard index based similarity measure to compare transcription factor binding site models
title_sort jaccard index based similarity measure to compare transcription factor binding site models
topic Software Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3851813/
https://www.ncbi.nlm.nih.gov/pubmed/24074225
http://dx.doi.org/10.1186/1748-7188-8-23
work_keys_str_mv AT vorontsovilyae jaccardindexbasedsimilaritymeasuretocomparetranscriptionfactorbindingsitemodels
AT kulakovskiyivanv jaccardindexbasedsimilaritymeasuretocomparetranscriptionfactorbindingsitemodels
AT makeevvsevolodj jaccardindexbasedsimilaritymeasuretocomparetranscriptionfactorbindingsitemodels