Cargando…

New scoring schema for finding motifs in DNA Sequences

BACKGROUND: Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict) known binding sites in a new DNA s...

Descripción completa

Detalles Bibliográficos
Autores principales: Zare-Mirakabad, Fatemeh, Ahrabian, Hayedeh, Sadeghi, Mehdei, Nowzari-Dalini, Abbas, Goliaei, Bahram
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2679735/
https://www.ncbi.nlm.nih.gov/pubmed/19302709
http://dx.doi.org/10.1186/1471-2105-10-93
_version_ 1782166916951965696
author Zare-Mirakabad, Fatemeh
Ahrabian, Hayedeh
Sadeghi, Mehdei
Nowzari-Dalini, Abbas
Goliaei, Bahram
author_facet Zare-Mirakabad, Fatemeh
Ahrabian, Hayedeh
Sadeghi, Mehdei
Nowzari-Dalini, Abbas
Goliaei, Bahram
author_sort Zare-Mirakabad, Fatemeh
collection PubMed
description BACKGROUND: Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict) known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions. RESULTS: We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions. CONCLUSION: The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple mathematical calculations. By implementing our method on several biological data sets, it can be induced that this method performs better than methods that do not consider dependencies.
format Text
id pubmed-2679735
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26797352009-05-09 New scoring schema for finding motifs in DNA Sequences Zare-Mirakabad, Fatemeh Ahrabian, Hayedeh Sadeghi, Mehdei Nowzari-Dalini, Abbas Goliaei, Bahram BMC Bioinformatics Research Article BACKGROUND: Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict) known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions. RESULTS: We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions. CONCLUSION: The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple mathematical calculations. By implementing our method on several biological data sets, it can be induced that this method performs better than methods that do not consider dependencies. BioMed Central 2009-03-20 /pmc/articles/PMC2679735/ /pubmed/19302709 http://dx.doi.org/10.1186/1471-2105-10-93 Text en Copyright © 2009 Zare-Mirakabad et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zare-Mirakabad, Fatemeh
Ahrabian, Hayedeh
Sadeghi, Mehdei
Nowzari-Dalini, Abbas
Goliaei, Bahram
New scoring schema for finding motifs in DNA Sequences
title New scoring schema for finding motifs in DNA Sequences
title_full New scoring schema for finding motifs in DNA Sequences
title_fullStr New scoring schema for finding motifs in DNA Sequences
title_full_unstemmed New scoring schema for finding motifs in DNA Sequences
title_short New scoring schema for finding motifs in DNA Sequences
title_sort new scoring schema for finding motifs in dna sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2679735/
https://www.ncbi.nlm.nih.gov/pubmed/19302709
http://dx.doi.org/10.1186/1471-2105-10-93
work_keys_str_mv AT zaremirakabadfatemeh newscoringschemaforfindingmotifsindnasequences
AT ahrabianhayedeh newscoringschemaforfindingmotifsindnasequences
AT sadeghimehdei newscoringschemaforfindingmotifsindnasequences
AT nowzaridaliniabbas newscoringschemaforfindingmotifsindnasequences
AT goliaeibahram newscoringschemaforfindingmotifsindnasequences