Cargando…

DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data

BACKGROUND: Discovering over-represented approximate motifs in DNA sequences is an essential part of bioinformatics. This topic has been studied extensively because of the increasing number of potential applications. However, it remains a difficult challenge, especially with the huge quantity of dat...

Descripción completa

Detalles Bibliográficos
Autores principales: Saad, Chadi, Noé, Laurent, Richard, Hugues, Leclerc, Julie, Buisine, Marie-Pierre, Touzet, Hélène, Figeac, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5996464/
https://www.ncbi.nlm.nih.gov/pubmed/29890948
http://dx.doi.org/10.1186/s12859-018-2215-1
_version_ 1783330863249883136
author Saad, Chadi
Noé, Laurent
Richard, Hugues
Leclerc, Julie
Buisine, Marie-Pierre
Touzet, Hélène
Figeac, Martin
author_facet Saad, Chadi
Noé, Laurent
Richard, Hugues
Leclerc, Julie
Buisine, Marie-Pierre
Touzet, Hélène
Figeac, Martin
author_sort Saad, Chadi
collection PubMed
description BACKGROUND: Discovering over-represented approximate motifs in DNA sequences is an essential part of bioinformatics. This topic has been studied extensively because of the increasing number of potential applications. However, it remains a difficult challenge, especially with the huge quantity of data generated by high throughput sequencing technologies. To overcome this problem, existing tools use greedy algorithms and probabilistic approaches to find motifs in reasonable time. Nevertheless these approaches lack sensitivity and have difficulties coping with rare and subtle motifs. RESULTS: We developed DiNAMO (for DNA MOtif), a new software based on an exhaustive and efficient algorithm for IUPAC motif discovery. We evaluated DiNAMO on synthetic and real datasets with two different applications, namely ChIP-seq peaks and Systematic Sequencing Error analysis. DiNAMO proves to compare favorably with other existing methods and is robust to noise. CONCLUSIONS: We shown that DiNAMO software can serve as a tool to search for degenerate motifs in an exact manner using IUPAC models. DiNAMO can be used in scanning mode with sliding windows or in fixed position mode, which makes it suitable for numerous potential applications. AVAILABILITY: https://github.com/bonsai-team/DiNAMO. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2215-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5996464
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59964642018-06-25 DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data Saad, Chadi Noé, Laurent Richard, Hugues Leclerc, Julie Buisine, Marie-Pierre Touzet, Hélène Figeac, Martin BMC Bioinformatics Methodology Article BACKGROUND: Discovering over-represented approximate motifs in DNA sequences is an essential part of bioinformatics. This topic has been studied extensively because of the increasing number of potential applications. However, it remains a difficult challenge, especially with the huge quantity of data generated by high throughput sequencing technologies. To overcome this problem, existing tools use greedy algorithms and probabilistic approaches to find motifs in reasonable time. Nevertheless these approaches lack sensitivity and have difficulties coping with rare and subtle motifs. RESULTS: We developed DiNAMO (for DNA MOtif), a new software based on an exhaustive and efficient algorithm for IUPAC motif discovery. We evaluated DiNAMO on synthetic and real datasets with two different applications, namely ChIP-seq peaks and Systematic Sequencing Error analysis. DiNAMO proves to compare favorably with other existing methods and is robust to noise. CONCLUSIONS: We shown that DiNAMO software can serve as a tool to search for degenerate motifs in an exact manner using IUPAC models. DiNAMO can be used in scanning mode with sliding windows or in fixed position mode, which makes it suitable for numerous potential applications. AVAILABILITY: https://github.com/bonsai-team/DiNAMO. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2215-1) contains supplementary material, which is available to authorized users. BioMed Central 2018-06-11 /pmc/articles/PMC5996464/ /pubmed/29890948 http://dx.doi.org/10.1186/s12859-018-2215-1 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Saad, Chadi
Noé, Laurent
Richard, Hugues
Leclerc, Julie
Buisine, Marie-Pierre
Touzet, Hélène
Figeac, Martin
DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data
title DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data
title_full DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data
title_fullStr DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data
title_full_unstemmed DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data
title_short DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data
title_sort dinamo: highly sensitive dna motif discovery in high-throughput sequencing data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5996464/
https://www.ncbi.nlm.nih.gov/pubmed/29890948
http://dx.doi.org/10.1186/s12859-018-2215-1
work_keys_str_mv AT saadchadi dinamohighlysensitivednamotifdiscoveryinhighthroughputsequencingdata
AT noelaurent dinamohighlysensitivednamotifdiscoveryinhighthroughputsequencingdata
AT richardhugues dinamohighlysensitivednamotifdiscoveryinhighthroughputsequencingdata
AT leclercjulie dinamohighlysensitivednamotifdiscoveryinhighthroughputsequencingdata
AT buisinemariepierre dinamohighlysensitivednamotifdiscoveryinhighthroughputsequencingdata
AT touzethelene dinamohighlysensitivednamotifdiscoveryinhighthroughputsequencingdata
AT figeacmartin dinamohighlysensitivednamotifdiscoveryinhighthroughputsequencingdata