Cargando…

SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents

BACKGROUND: Discovering transcription factor binding sites (TFBS) is one of primary challenges to decipher complex gene regulatory networks encrypted in a genome. A set of short DNA sequences identified by a transcription factor (TF) is known as a motif, which can be expressed accurately in matrix f...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Shaoqiang, Zhou, Xiguo, Du, Chuanbin, Su, Zhengchang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866262/
https://www.ncbi.nlm.nih.gov/pubmed/24564945
http://dx.doi.org/10.1186/1752-0509-7-S2-S14
_version_ 1782296135633731584
author Zhang, Shaoqiang
Zhou, Xiguo
Du, Chuanbin
Su, Zhengchang
author_facet Zhang, Shaoqiang
Zhou, Xiguo
Du, Chuanbin
Su, Zhengchang
author_sort Zhang, Shaoqiang
collection PubMed
description BACKGROUND: Discovering transcription factor binding sites (TFBS) is one of primary challenges to decipher complex gene regulatory networks encrypted in a genome. A set of short DNA sequences identified by a transcription factor (TF) is known as a motif, which can be expressed accurately in matrix form such as a position-specific scoring matrix (PSSM) and a position frequency matrix. Very frequently, we need to query a motif in a database of motifs by seeking its similar motifs, merge similar TFBS motifs possibly identified by the same TF, separate irrelevant motifs, or filter out spurious motifs. Therefore, a novel metric is required to seize slight differences between irrelevant motifs and highlight the similarity between motifs of the same group in all these applications. While there are already several metrics for motif similarity proposed before, their performance is still far from satisfactory for these applications. METHODS: A novel metric has been proposed in this paper with name as SPIC (Similarity with Position Information Contents) for measuring the similarity between a column of a motif and a column of another motif. When defining this similarity score, we consider the likelihood that the column of the first motif's PFM can be produced by the column of the second motif's PSSM, and multiply the likelihood by the information content of the column of the second motif's PSSM, and vise versa. We evaluated the performance of SPIC combined with a local or a global alignment method having a function for affine gap penalty, for computing the similarity between two motifs. We also compared SPIC with seven existing state-of-the-arts metrics for their capability of clustering motifs from the same group and retrieving motifs from a database on three datasets. RESULTS: When used jointly with the Smith-Waterman local alignment method with an affine gap penalty function (gap open penalty is equal to1, gap extension penalty is equal to 0.5), SPIC outperforms the seven existing state-of-the-art motif similarity metrics combined with their best alignments for matching motifs in database searches, and clustering the same TF's sub-motifs or distinguishing relevant ones from a miscellaneous group of motifs. CONCLUSIONS: We have developed a novel motif similarity metric that can more accurately match motifs in database searches, and more effectively cluster similar motifs and differentiate irrelevant motifs than do the other seven metrics we are aware of.
format Online
Article
Text
id pubmed-3866262
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38662622013-12-20 SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents Zhang, Shaoqiang Zhou, Xiguo Du, Chuanbin Su, Zhengchang BMC Syst Biol Research BACKGROUND: Discovering transcription factor binding sites (TFBS) is one of primary challenges to decipher complex gene regulatory networks encrypted in a genome. A set of short DNA sequences identified by a transcription factor (TF) is known as a motif, which can be expressed accurately in matrix form such as a position-specific scoring matrix (PSSM) and a position frequency matrix. Very frequently, we need to query a motif in a database of motifs by seeking its similar motifs, merge similar TFBS motifs possibly identified by the same TF, separate irrelevant motifs, or filter out spurious motifs. Therefore, a novel metric is required to seize slight differences between irrelevant motifs and highlight the similarity between motifs of the same group in all these applications. While there are already several metrics for motif similarity proposed before, their performance is still far from satisfactory for these applications. METHODS: A novel metric has been proposed in this paper with name as SPIC (Similarity with Position Information Contents) for measuring the similarity between a column of a motif and a column of another motif. When defining this similarity score, we consider the likelihood that the column of the first motif's PFM can be produced by the column of the second motif's PSSM, and multiply the likelihood by the information content of the column of the second motif's PSSM, and vise versa. We evaluated the performance of SPIC combined with a local or a global alignment method having a function for affine gap penalty, for computing the similarity between two motifs. We also compared SPIC with seven existing state-of-the-arts metrics for their capability of clustering motifs from the same group and retrieving motifs from a database on three datasets. RESULTS: When used jointly with the Smith-Waterman local alignment method with an affine gap penalty function (gap open penalty is equal to1, gap extension penalty is equal to 0.5), SPIC outperforms the seven existing state-of-the-art motif similarity metrics combined with their best alignments for matching motifs in database searches, and clustering the same TF's sub-motifs or distinguishing relevant ones from a miscellaneous group of motifs. CONCLUSIONS: We have developed a novel motif similarity metric that can more accurately match motifs in database searches, and more effectively cluster similar motifs and differentiate irrelevant motifs than do the other seven metrics we are aware of. BioMed Central 2013-12-17 /pmc/articles/PMC3866262/ /pubmed/24564945 http://dx.doi.org/10.1186/1752-0509-7-S2-S14 Text en Copyright © 2013 Zhang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Zhang, Shaoqiang
Zhou, Xiguo
Du, Chuanbin
Su, Zhengchang
SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents
title SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents
title_full SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents
title_fullStr SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents
title_full_unstemmed SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents
title_short SPIC: A novel similarity metric for comparing transcription factor binding site motifs based on information contents
title_sort spic: a novel similarity metric for comparing transcription factor binding site motifs based on information contents
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866262/
https://www.ncbi.nlm.nih.gov/pubmed/24564945
http://dx.doi.org/10.1186/1752-0509-7-S2-S14
work_keys_str_mv AT zhangshaoqiang spicanovelsimilaritymetricforcomparingtranscriptionfactorbindingsitemotifsbasedoninformationcontents
AT zhouxiguo spicanovelsimilaritymetricforcomparingtranscriptionfactorbindingsitemotifsbasedoninformationcontents
AT duchuanbin spicanovelsimilaritymetricforcomparingtranscriptionfactorbindingsitemotifsbasedoninformationcontents
AT suzhengchang spicanovelsimilaritymetricforcomparingtranscriptionfactorbindingsitemotifsbasedoninformationcontents