Cargando…
Jaccard index based similarity measure to compare transcription factor binding site models
BACKGROUND: Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model. TF binding DNA fragments obtained by diff...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3851813/ https://www.ncbi.nlm.nih.gov/pubmed/24074225 http://dx.doi.org/10.1186/1748-7188-8-23 |
_version_ | 1782294356814725120 |
---|---|
author | Vorontsov, Ilya E Kulakovskiy, Ivan V Makeev, Vsevolod J |
author_facet | Vorontsov, Ilya E Kulakovskiy, Ivan V Makeev, Vsevolod J |
author_sort | Vorontsov, Ilya E |
collection | PubMed |
description | BACKGROUND: Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model. TF binding DNA fragments obtained by different experimental methods usually give similar but not identical PWMs. This is also common for different TFs from the same structural family. Thus it is often necessary to measure the similarity between PWMs. The popular tools compare PWMs directly using matrix elements. Yet, for log-odds PWMs, negative elements do not contribute to the scores of highly scoring TFBS and thus may be different without affecting the sets of the best recognized binding sites. Moreover, the two TFBS sets recognized by a given pair of PWMs can be more or less different depending on the score thresholds. RESULTS: We propose a practical approach for comparing two TFBS models, each consisting of a PWM and the respective scoring threshold. The proposed measure is a variant of the Jaccard index between two TFBS sets. The measure defines a metric space for TFBS models of all finite lengths. The algorithm can compare TFBS models constructed using substantially different approaches, like PWMs with raw positional counts and log-odds. We present the efficient software implementation: MACRO-APE (MAtrix CompaRisOn by Approximate P-value Estimation). CONCLUSIONS: MACRO-APE can be effectively used to compute the Jaccard index based similarity for two TFBS models. A two-pass scanning algorithm is presented to scan a given collection of PWMs for PWMs similar to a given query. AVAILABILITY AND IMPLEMENTATION: MACRO-APE is implemented in ruby 1.9; software including source code and a manual is freely available at http://autosome.ru/macroape/ and in supplementary materials. |
format | Online Article Text |
id | pubmed-3851813 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-38518132013-12-20 Jaccard index based similarity measure to compare transcription factor binding site models Vorontsov, Ilya E Kulakovskiy, Ivan V Makeev, Vsevolod J Algorithms Mol Biol Software Article BACKGROUND: Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model. TF binding DNA fragments obtained by different experimental methods usually give similar but not identical PWMs. This is also common for different TFs from the same structural family. Thus it is often necessary to measure the similarity between PWMs. The popular tools compare PWMs directly using matrix elements. Yet, for log-odds PWMs, negative elements do not contribute to the scores of highly scoring TFBS and thus may be different without affecting the sets of the best recognized binding sites. Moreover, the two TFBS sets recognized by a given pair of PWMs can be more or less different depending on the score thresholds. RESULTS: We propose a practical approach for comparing two TFBS models, each consisting of a PWM and the respective scoring threshold. The proposed measure is a variant of the Jaccard index between two TFBS sets. The measure defines a metric space for TFBS models of all finite lengths. The algorithm can compare TFBS models constructed using substantially different approaches, like PWMs with raw positional counts and log-odds. We present the efficient software implementation: MACRO-APE (MAtrix CompaRisOn by Approximate P-value Estimation). CONCLUSIONS: MACRO-APE can be effectively used to compute the Jaccard index based similarity for two TFBS models. A two-pass scanning algorithm is presented to scan a given collection of PWMs for PWMs similar to a given query. AVAILABILITY AND IMPLEMENTATION: MACRO-APE is implemented in ruby 1.9; software including source code and a manual is freely available at http://autosome.ru/macroape/ and in supplementary materials. BioMed Central 2013-09-30 /pmc/articles/PMC3851813/ /pubmed/24074225 http://dx.doi.org/10.1186/1748-7188-8-23 Text en Copyright © 2013 Vorontsov et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Article Vorontsov, Ilya E Kulakovskiy, Ivan V Makeev, Vsevolod J Jaccard index based similarity measure to compare transcription factor binding site models |
title | Jaccard index based similarity measure to compare transcription factor binding site models |
title_full | Jaccard index based similarity measure to compare transcription factor binding site models |
title_fullStr | Jaccard index based similarity measure to compare transcription factor binding site models |
title_full_unstemmed | Jaccard index based similarity measure to compare transcription factor binding site models |
title_short | Jaccard index based similarity measure to compare transcription factor binding site models |
title_sort | jaccard index based similarity measure to compare transcription factor binding site models |
topic | Software Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3851813/ https://www.ncbi.nlm.nih.gov/pubmed/24074225 http://dx.doi.org/10.1186/1748-7188-8-23 |
work_keys_str_mv | AT vorontsovilyae jaccardindexbasedsimilaritymeasuretocomparetranscriptionfactorbindingsitemodels AT kulakovskiyivanv jaccardindexbasedsimilaritymeasuretocomparetranscriptionfactorbindingsitemodels AT makeevvsevolodj jaccardindexbasedsimilaritymeasuretocomparetranscriptionfactorbindingsitemodels |