Cargando…
SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents
BACKGROUND: Discovering transcription factor binding sites (TFBS) is one of primary challenges to decipher complex gene regulatory networks encrypted in a genome. A set of short DNA sequences identified by a transcription factor (TF) is known as a motif, which can be expressed accurately in matrix f...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866262/ https://www.ncbi.nlm.nih.gov/pubmed/24564945 http://dx.doi.org/10.1186/1752-0509-7-S2-S14 |
_version_ | 1782296135633731584 |
---|---|
author | Zhang, Shaoqiang Zhou, Xiguo Du, Chuanbin Su, Zhengchang |
author_facet | Zhang, Shaoqiang Zhou, Xiguo Du, Chuanbin Su, Zhengchang |
author_sort | Zhang, Shaoqiang |
collection | PubMed |
description | BACKGROUND: Discovering transcription factor binding sites (TFBS) is one of primary challenges to decipher complex gene regulatory networks encrypted in a genome. A set of short DNA sequences identified by a transcription factor (TF) is known as a motif, which can be expressed accurately in matrix form such as a position-specific scoring matrix (PSSM) and a position frequency matrix. Very frequently, we need to query a motif in a database of motifs by seeking its similar motifs, merge similar TFBS motifs possibly identified by the same TF, separate irrelevant motifs, or filter out spurious motifs. Therefore, a novel metric is required to seize slight differences between irrelevant motifs and highlight the similarity between motifs of the same group in all these applications. While there are already several metrics for motif similarity proposed before, their performance is still far from satisfactory for these applications. METHODS: A novel metric has been proposed in this paper with name as SPIC (Similarity with Position Information Contents) for measuring the similarity between a column of a motif and a column of another motif. When defining this similarity score, we consider the likelihood that the column of the first motif's PFM can be produced by the column of the second motif's PSSM, and multiply the likelihood by the information content of the column of the second motif's PSSM, and vise versa. We evaluated the performance of SPIC combined with a local or a global alignment method having a function for affine gap penalty, for computing the similarity between two motifs. We also compared SPIC with seven existing state-of-the-arts metrics for their capability of clustering motifs from the same group and retrieving motifs from a database on three datasets. RESULTS: When used jointly with the Smith-Waterman local alignment method with an affine gap penalty function (gap open penalty is equal to1, gap extension penalty is equal to 0.5), SPIC outperforms the seven existing state-of-the-art motif similarity metrics combined with their best alignments for matching motifs in database searches, and clustering the same TF's sub-motifs or distinguishing relevant ones from a miscellaneous group of motifs. CONCLUSIONS: We have developed a novel motif similarity metric that can more accurately match motifs in database searches, and more effectively cluster similar motifs and differentiate irrelevant motifs than do the other seven metrics we are aware of. |
format | Online Article Text |
id | pubmed-3866262 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-38662622013-12-20 SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents Zhang, Shaoqiang Zhou, Xiguo Du, Chuanbin Su, Zhengchang BMC Syst Biol Research BACKGROUND: Discovering transcription factor binding sites (TFBS) is one of primary challenges to decipher complex gene regulatory networks encrypted in a genome. A set of short DNA sequences identified by a transcription factor (TF) is known as a motif, which can be expressed accurately in matrix form such as a position-specific scoring matrix (PSSM) and a position frequency matrix. Very frequently, we need to query a motif in a database of motifs by seeking its similar motifs, merge similar TFBS motifs possibly identified by the same TF, separate irrelevant motifs, or filter out spurious motifs. Therefore, a novel metric is required to seize slight differences between irrelevant motifs and highlight the similarity between motifs of the same group in all these applications. While there are already several metrics for motif similarity proposed before, their performance is still far from satisfactory for these applications. METHODS: A novel metric has been proposed in this paper with name as SPIC (Similarity with Position Information Contents) for measuring the similarity between a column of a motif and a column of another motif. When defining this similarity score, we consider the likelihood that the column of the first motif's PFM can be produced by the column of the second motif's PSSM, and multiply the likelihood by the information content of the column of the second motif's PSSM, and vise versa. We evaluated the performance of SPIC combined with a local or a global alignment method having a function for affine gap penalty, for computing the similarity between two motifs. We also compared SPIC with seven existing state-of-the-arts metrics for their capability of clustering motifs from the same group and retrieving motifs from a database on three datasets. RESULTS: When used jointly with the Smith-Waterman local alignment method with an affine gap penalty function (gap open penalty is equal to1, gap extension penalty is equal to 0.5), SPIC outperforms the seven existing state-of-the-art motif similarity metrics combined with their best alignments for matching motifs in database searches, and clustering the same TF's sub-motifs or distinguishing relevant ones from a miscellaneous group of motifs. CONCLUSIONS: We have developed a novel motif similarity metric that can more accurately match motifs in database searches, and more effectively cluster similar motifs and differentiate irrelevant motifs than do the other seven metrics we are aware of. BioMed Central 2013-12-17 /pmc/articles/PMC3866262/ /pubmed/24564945 http://dx.doi.org/10.1186/1752-0509-7-S2-S14 Text en Copyright © 2013 Zhang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Zhang, Shaoqiang Zhou, Xiguo Du, Chuanbin Su, Zhengchang SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents |
title | SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents |
title_full | SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents |
title_fullStr | SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents |
title_full_unstemmed | SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents |
title_short | SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents |
title_sort | spic: a novel similarity metric for comparing transcription factor binding site motifs based on information contents |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866262/ https://www.ncbi.nlm.nih.gov/pubmed/24564945 http://dx.doi.org/10.1186/1752-0509-7-S2-S14 |
work_keys_str_mv | AT zhangshaoqiang spicanovelsimilaritymetricforcomparingtranscriptionfactorbindingsitemotifsbasedoninformationcontents AT zhouxiguo spicanovelsimilaritymetricforcomparingtranscriptionfactorbindingsitemotifsbasedoninformationcontents AT duchuanbin spicanovelsimilaritymetricforcomparingtranscriptionfactorbindingsitemotifsbasedoninformationcontents AT suzhengchang spicanovelsimilaritymetricforcomparingtranscriptionfactorbindingsitemotifsbasedoninformationcontents |