Cargando…
Set cover-based methods for motif selection
MOTIVATION: De novo motif discovery algorithms find statistically over-represented sequence motifs that may function as transcription factor binding sites. Current methods often report large numbers of motifs, making it difficult to perform further analyses and experimental validation. The motif sel...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703758/ https://www.ncbi.nlm.nih.gov/pubmed/31665223 http://dx.doi.org/10.1093/bioinformatics/btz697 |
_version_ | 1783616689544364032 |
---|---|
author | Li, Yichao Liu, Yating Juedes, David Drews, Frank Bunescu, Razvan Welch, Lonnie |
author_facet | Li, Yichao Liu, Yating Juedes, David Drews, Frank Bunescu, Razvan Welch, Lonnie |
author_sort | Li, Yichao |
collection | PubMed |
description | MOTIVATION: De novo motif discovery algorithms find statistically over-represented sequence motifs that may function as transcription factor binding sites. Current methods often report large numbers of motifs, making it difficult to perform further analyses and experimental validation. The motif selection problem seeks to identify a minimal set of putative regulatory motifs that characterize sequences of interest (e.g. ChIP-Seq binding regions). RESULTS: In this study, the motif selection problem is mapped to variants of the set cover problem that are solved via tabu search and by relaxed integer linear programing (RILP). The algorithms are employed to analyze 349 ChIP-Seq experiments from the ENCODE project, yielding a small number of high-quality motifs that represent putative binding sites of primary factors and cofactors. Specifically, when compared with the motifs reported by Kheradpour and Kellis, the set cover-based algorithms produced motif sets covering 35% more peaks for 11 TFs and identified 4 more putative cofactors for 6 TFs. Moreover, a systematic evaluation using nested cross-validation revealed that the RILP algorithm selected fewer motifs and was able to cover 6% more peaks and 3% fewer background regions, which reduced the error rate by 7%. AVAILABILITY AND IMPLEMENTATION: The source code of the algorithms and all the datasets are available at https://github.com/YichaoOU/Set_cover_tools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7703758 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-77037582020-12-07 Set cover-based methods for motif selection Li, Yichao Liu, Yating Juedes, David Drews, Frank Bunescu, Razvan Welch, Lonnie Bioinformatics Original Papers MOTIVATION: De novo motif discovery algorithms find statistically over-represented sequence motifs that may function as transcription factor binding sites. Current methods often report large numbers of motifs, making it difficult to perform further analyses and experimental validation. The motif selection problem seeks to identify a minimal set of putative regulatory motifs that characterize sequences of interest (e.g. ChIP-Seq binding regions). RESULTS: In this study, the motif selection problem is mapped to variants of the set cover problem that are solved via tabu search and by relaxed integer linear programing (RILP). The algorithms are employed to analyze 349 ChIP-Seq experiments from the ENCODE project, yielding a small number of high-quality motifs that represent putative binding sites of primary factors and cofactors. Specifically, when compared with the motifs reported by Kheradpour and Kellis, the set cover-based algorithms produced motif sets covering 35% more peaks for 11 TFs and identified 4 more putative cofactors for 6 TFs. Moreover, a systematic evaluation using nested cross-validation revealed that the RILP algorithm selected fewer motifs and was able to cover 6% more peaks and 3% fewer background regions, which reduced the error rate by 7%. AVAILABILITY AND IMPLEMENTATION: The source code of the algorithms and all the datasets are available at https://github.com/YichaoOU/Set_cover_tools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-02-15 2019-09-17 /pmc/articles/PMC7703758/ /pubmed/31665223 http://dx.doi.org/10.1093/bioinformatics/btz697 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Li, Yichao Liu, Yating Juedes, David Drews, Frank Bunescu, Razvan Welch, Lonnie Set cover-based methods for motif selection |
title | Set cover-based methods for motif selection |
title_full | Set cover-based methods for motif selection |
title_fullStr | Set cover-based methods for motif selection |
title_full_unstemmed | Set cover-based methods for motif selection |
title_short | Set cover-based methods for motif selection |
title_sort | set cover-based methods for motif selection |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703758/ https://www.ncbi.nlm.nih.gov/pubmed/31665223 http://dx.doi.org/10.1093/bioinformatics/btz697 |
work_keys_str_mv | AT liyichao setcoverbasedmethodsformotifselection AT liuyating setcoverbasedmethodsformotifselection AT juedesdavid setcoverbasedmethodsformotifselection AT drewsfrank setcoverbasedmethodsformotifselection AT bunescurazvan setcoverbasedmethodsformotifselection AT welchlonnie setcoverbasedmethodsformotifselection |