Cargando…

Set cover-based methods for motif selection

MOTIVATION: De novo motif discovery algorithms find statistically over-represented sequence motifs that may function as transcription factor binding sites. Current methods often report large numbers of motifs, making it difficult to perform further analyses and experimental validation. The motif sel...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Yichao, Liu, Yating, Juedes, David, Drews, Frank, Bunescu, Razvan, Welch, Lonnie
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703758/ https://www.ncbi.nlm.nih.gov/pubmed/31665223 http://dx.doi.org/10.1093/bioinformatics/btz697

_version_	1783616689544364032
author	Li, Yichao Liu, Yating Juedes, David Drews, Frank Bunescu, Razvan Welch, Lonnie
author_facet	Li, Yichao Liu, Yating Juedes, David Drews, Frank Bunescu, Razvan Welch, Lonnie
author_sort	Li, Yichao
collection	PubMed
description	MOTIVATION: De novo motif discovery algorithms find statistically over-represented sequence motifs that may function as transcription factor binding sites. Current methods often report large numbers of motifs, making it difficult to perform further analyses and experimental validation. The motif selection problem seeks to identify a minimal set of putative regulatory motifs that characterize sequences of interest (e.g. ChIP-Seq binding regions). RESULTS: In this study, the motif selection problem is mapped to variants of the set cover problem that are solved via tabu search and by relaxed integer linear programing (RILP). The algorithms are employed to analyze 349 ChIP-Seq experiments from the ENCODE project, yielding a small number of high-quality motifs that represent putative binding sites of primary factors and cofactors. Specifically, when compared with the motifs reported by Kheradpour and Kellis, the set cover-based algorithms produced motif sets covering 35% more peaks for 11 TFs and identified 4 more putative cofactors for 6 TFs. Moreover, a systematic evaluation using nested cross-validation revealed that the RILP algorithm selected fewer motifs and was able to cover 6% more peaks and 3% fewer background regions, which reduced the error rate by 7%. AVAILABILITY AND IMPLEMENTATION: The source code of the algorithms and all the datasets are available at https://github.com/YichaoOU/Set_cover_tools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-7703758
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-77037582020-12-07 Set cover-based methods for motif selection Li, Yichao Liu, Yating Juedes, David Drews, Frank Bunescu, Razvan Welch, Lonnie Bioinformatics Original Papers MOTIVATION: De novo motif discovery algorithms find statistically over-represented sequence motifs that may function as transcription factor binding sites. Current methods often report large numbers of motifs, making it difficult to perform further analyses and experimental validation. The motif selection problem seeks to identify a minimal set of putative regulatory motifs that characterize sequences of interest (e.g. ChIP-Seq binding regions). RESULTS: In this study, the motif selection problem is mapped to variants of the set cover problem that are solved via tabu search and by relaxed integer linear programing (RILP). The algorithms are employed to analyze 349 ChIP-Seq experiments from the ENCODE project, yielding a small number of high-quality motifs that represent putative binding sites of primary factors and cofactors. Specifically, when compared with the motifs reported by Kheradpour and Kellis, the set cover-based algorithms produced motif sets covering 35% more peaks for 11 TFs and identified 4 more putative cofactors for 6 TFs. Moreover, a systematic evaluation using nested cross-validation revealed that the RILP algorithm selected fewer motifs and was able to cover 6% more peaks and 3% fewer background regions, which reduced the error rate by 7%. AVAILABILITY AND IMPLEMENTATION: The source code of the algorithms and all the datasets are available at https://github.com/YichaoOU/Set_cover_tools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-02-15 2019-09-17 /pmc/articles/PMC7703758/ /pubmed/31665223 http://dx.doi.org/10.1093/bioinformatics/btz697 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Li, Yichao Liu, Yating Juedes, David Drews, Frank Bunescu, Razvan Welch, Lonnie Set cover-based methods for motif selection
title	Set cover-based methods for motif selection
title_full	Set cover-based methods for motif selection
title_fullStr	Set cover-based methods for motif selection
title_full_unstemmed	Set cover-based methods for motif selection
title_short	Set cover-based methods for motif selection
title_sort	set cover-based methods for motif selection
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703758/ https://www.ncbi.nlm.nih.gov/pubmed/31665223 http://dx.doi.org/10.1093/bioinformatics/btz697
work_keys_str_mv	AT liyichao setcoverbasedmethodsformotifselection AT liuyating setcoverbasedmethodsformotifselection AT juedesdavid setcoverbasedmethodsformotifselection AT drewsfrank setcoverbasedmethodsformotifselection AT bunescurazvan setcoverbasedmethodsformotifselection AT welchlonnie setcoverbasedmethodsformotifselection

Set cover-based methods for motif selection

Ejemplares similares