Cargando…

Finding motifs using DNA images derived from sparse representations

MOTIVATION: Motifs play a crucial role in computational biology, as they provide valuable information about the binding specificity of proteins. However, conventional motif discovery methods typically rely on simple combinatoric or probabilistic approaches, which can be biased by heuristics such as...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chu, Shane K, Stormo, Gary D
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10290554/ https://www.ncbi.nlm.nih.gov/pubmed/37294804 http://dx.doi.org/10.1093/bioinformatics/btad378

_version_	1785062516183793664
author	Chu, Shane K Stormo, Gary D
author_facet	Chu, Shane K Stormo, Gary D
author_sort	Chu, Shane K
collection	PubMed
description	MOTIVATION: Motifs play a crucial role in computational biology, as they provide valuable information about the binding specificity of proteins. However, conventional motif discovery methods typically rely on simple combinatoric or probabilistic approaches, which can be biased by heuristics such as substring-masking for multiple motif discovery. In recent years, deep neural networks have become increasingly popular for motif discovery, as they are capable of capturing complex patterns in data. Nonetheless, inferring motifs from neural networks remains a challenging problem, both from a modeling and computational standpoint, despite the success of these networks in supervised learning tasks. RESULTS: We present a principled representation learning approach based on a hierarchical sparse representation for motif discovery. Our method effectively discovers gapped, long, or overlapping motifs that we show to commonly exist in next-generation sequencing datasets, in addition to the short and enriched primary binding sites. Our model is fully interpretable, fast, and capable of capturing motifs in a large number of DNA strings. A key concept emerged from our approach—enumerating at the image level—effectively overcomes the k-mers paradigm, enabling modest computational resources for capturing the long and varied but conserved patterns, in addition to capturing the primary binding sites. AVAILABILITY AND IMPLEMENTATION: Our method is available as a Julia package under the MIT license at https://github.com/kchu25/MOTIFs.jl, and the results on experimental data can be found at https://zenodo.org/record/7783033.
format	Online Article Text
id	pubmed-10290554
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-102905542023-06-25 Finding motifs using DNA images derived from sparse representations Chu, Shane K Stormo, Gary D Bioinformatics Original Paper MOTIVATION: Motifs play a crucial role in computational biology, as they provide valuable information about the binding specificity of proteins. However, conventional motif discovery methods typically rely on simple combinatoric or probabilistic approaches, which can be biased by heuristics such as substring-masking for multiple motif discovery. In recent years, deep neural networks have become increasingly popular for motif discovery, as they are capable of capturing complex patterns in data. Nonetheless, inferring motifs from neural networks remains a challenging problem, both from a modeling and computational standpoint, despite the success of these networks in supervised learning tasks. RESULTS: We present a principled representation learning approach based on a hierarchical sparse representation for motif discovery. Our method effectively discovers gapped, long, or overlapping motifs that we show to commonly exist in next-generation sequencing datasets, in addition to the short and enriched primary binding sites. Our model is fully interpretable, fast, and capable of capturing motifs in a large number of DNA strings. A key concept emerged from our approach—enumerating at the image level—effectively overcomes the k-mers paradigm, enabling modest computational resources for capturing the long and varied but conserved patterns, in addition to capturing the primary binding sites. AVAILABILITY AND IMPLEMENTATION: Our method is available as a Julia package under the MIT license at https://github.com/kchu25/MOTIFs.jl, and the results on experimental data can be found at https://zenodo.org/record/7783033. Oxford University Press 2023-06-09 /pmc/articles/PMC10290554/ /pubmed/37294804 http://dx.doi.org/10.1093/bioinformatics/btad378 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Paper Chu, Shane K Stormo, Gary D Finding motifs using DNA images derived from sparse representations
title	Finding motifs using DNA images derived from sparse representations
title_full	Finding motifs using DNA images derived from sparse representations
title_fullStr	Finding motifs using DNA images derived from sparse representations
title_full_unstemmed	Finding motifs using DNA images derived from sparse representations
title_short	Finding motifs using DNA images derived from sparse representations
title_sort	finding motifs using dna images derived from sparse representations
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10290554/ https://www.ncbi.nlm.nih.gov/pubmed/37294804 http://dx.doi.org/10.1093/bioinformatics/btad378
work_keys_str_mv	AT chushanek findingmotifsusingdnaimagesderivedfromsparserepresentations AT stormogaryd findingmotifsusingdnaimagesderivedfromsparserepresentations

Finding motifs using DNA images derived from sparse representations

Ejemplares similares