Cargando…

Direct vs 2-stage approaches to structured motif finding

BACKGROUND: The notion of DNA motif is a mathematical abstraction used to model regions of the DNA (known as Transcription Factor Binding Sites, or TFBSs) that are bound by a given Transcription Factor to regulate gene expression or repression. In turn, DNA structured motifs are a mathematical count...

Descripción completa

Detalles Bibliográficos
Autores principales:	Federico, Maria, Leoncini, Mauro, Montangero, Manuela, Valente, Paolo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3564690/ https://www.ncbi.nlm.nih.gov/pubmed/22908910 http://dx.doi.org/10.1186/1748-7188-7-20

_version_	1782258332464054272
author	Federico, Maria Leoncini, Mauro Montangero, Manuela Valente, Paolo
author_facet	Federico, Maria Leoncini, Mauro Montangero, Manuela Valente, Paolo
author_sort	Federico, Maria
collection	PubMed
description	BACKGROUND: The notion of DNA motif is a mathematical abstraction used to model regions of the DNA (known as Transcription Factor Binding Sites, or TFBSs) that are bound by a given Transcription Factor to regulate gene expression or repression. In turn, DNA structured motifs are a mathematical counterpart that models sets of TFBSs that work in concert in the gene regulations processes of higher eukaryotic organisms. Typically, a structured motif is composed of an ordered set of isolated (or simple) motifs, separated by a variable, but somewhat constrained number of “irrelevant” base-pairs. Discovering structured motifs in a set of DNA sequences is a computationally hard problem that has been addressed by a number of authors using either a direct approach, or via the preliminary identification and successive combination of simple motifs. RESULTS: We describe a computational tool, named SISMA, for the de-novo discovery of structured motifs in a set of DNA sequences. SISMA is an exact, enumerative algorithm, meaning that it finds all the motifs conforming to the specifications. It does so in two stages: first it discovers all the possible component simple motifs, then combines them in a way that respects the given constraints. We developed SISMA mainly with the aim of understanding the potential benefits of such a 2-stage approach w.r.t. direct methods. In fact, no 2-stage software was available for the general problem of structured motif discovery, but only a few tools that solved restricted versions of the problem. We evaluated SISMA against other published tools on a comprehensive benchmark made of both synthetic and real biological datasets. In a significant number of cases, SISMA outperformed the competitors, exhibiting a good performance also in most of the cases in which it was inferior. CONCLUSIONS: A reflection on the results obtained lead us to conclude that a 2-stage approach can be implemented with many advantages over direct approaches. Some of these have to do with greater modularity, ease of parallelization, and the possibility to perform adaptive searches of structured motifs. As another consideration, we noted that most hard instances for SISMA were easy to detect in advance. In these cases one may initially opt for a direct method; or, as a viable alternative in most laboratories, one could run both direct and 2-stage tools in parallel, halting the computations when the first halts.
format	Online Article Text
id	pubmed-3564690
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35646902013-02-08 Direct vs 2-stage approaches to structured motif finding Federico, Maria Leoncini, Mauro Montangero, Manuela Valente, Paolo Algorithms Mol Biol Research BACKGROUND: The notion of DNA motif is a mathematical abstraction used to model regions of the DNA (known as Transcription Factor Binding Sites, or TFBSs) that are bound by a given Transcription Factor to regulate gene expression or repression. In turn, DNA structured motifs are a mathematical counterpart that models sets of TFBSs that work in concert in the gene regulations processes of higher eukaryotic organisms. Typically, a structured motif is composed of an ordered set of isolated (or simple) motifs, separated by a variable, but somewhat constrained number of “irrelevant” base-pairs. Discovering structured motifs in a set of DNA sequences is a computationally hard problem that has been addressed by a number of authors using either a direct approach, or via the preliminary identification and successive combination of simple motifs. RESULTS: We describe a computational tool, named SISMA, for the de-novo discovery of structured motifs in a set of DNA sequences. SISMA is an exact, enumerative algorithm, meaning that it finds all the motifs conforming to the specifications. It does so in two stages: first it discovers all the possible component simple motifs, then combines them in a way that respects the given constraints. We developed SISMA mainly with the aim of understanding the potential benefits of such a 2-stage approach w.r.t. direct methods. In fact, no 2-stage software was available for the general problem of structured motif discovery, but only a few tools that solved restricted versions of the problem. We evaluated SISMA against other published tools on a comprehensive benchmark made of both synthetic and real biological datasets. In a significant number of cases, SISMA outperformed the competitors, exhibiting a good performance also in most of the cases in which it was inferior. CONCLUSIONS: A reflection on the results obtained lead us to conclude that a 2-stage approach can be implemented with many advantages over direct approaches. Some of these have to do with greater modularity, ease of parallelization, and the possibility to perform adaptive searches of structured motifs. As another consideration, we noted that most hard instances for SISMA were easy to detect in advance. In these cases one may initially opt for a direct method; or, as a viable alternative in most laboratories, one could run both direct and 2-stage tools in parallel, halting the computations when the first halts. BioMed Central 2012-08-21 /pmc/articles/PMC3564690/ /pubmed/22908910 http://dx.doi.org/10.1186/1748-7188-7-20 Text en Copyright ©2012 Federico et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Federico, Maria Leoncini, Mauro Montangero, Manuela Valente, Paolo Direct vs 2-stage approaches to structured motif finding
title	Direct vs 2-stage approaches to structured motif finding
title_full	Direct vs 2-stage approaches to structured motif finding
title_fullStr	Direct vs 2-stage approaches to structured motif finding
title_full_unstemmed	Direct vs 2-stage approaches to structured motif finding
title_short	Direct vs 2-stage approaches to structured motif finding
title_sort	direct vs 2-stage approaches to structured motif finding
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3564690/ https://www.ncbi.nlm.nih.gov/pubmed/22908910 http://dx.doi.org/10.1186/1748-7188-7-20
work_keys_str_mv	AT federicomaria directvs2stageapproachestostructuredmotiffinding AT leoncinimauro directvs2stageapproachestostructuredmotiffinding AT montangeromanuela directvs2stageapproachestostructuredmotiffinding AT valentepaolo directvs2stageapproachestostructuredmotiffinding

Direct vs 2-stage approaches to structured motif finding

Ejemplares similares