Cargando…

Neural networks with circular filters enable data efficient inference of sequence motifs

MOTIVATION: Nucleic acids and proteins often have localized sequence motifs that enable highly specific interactions. Due to the biological relevance of sequence motifs, numerous inference methods have been developed. Recently, convolutional neural networks (CNNs) have achieved state of the art perf...

Descripción completa

Detalles Bibliográficos
Autores principales: Blum, Christopher F, Kollmann, Markus
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6792110/
https://www.ncbi.nlm.nih.gov/pubmed/30918943
http://dx.doi.org/10.1093/bioinformatics/btz194
_version_ 1783459083525816320
author Blum, Christopher F
Kollmann, Markus
author_facet Blum, Christopher F
Kollmann, Markus
author_sort Blum, Christopher F
collection PubMed
description MOTIVATION: Nucleic acids and proteins often have localized sequence motifs that enable highly specific interactions. Due to the biological relevance of sequence motifs, numerous inference methods have been developed. Recently, convolutional neural networks (CNNs) have achieved state of the art performance. These methods were able to learn transcription factor binding sites from ChIP-seq data, resulting in accurate predictions on test data. However, CNNs typically distribute learned motifs across multiple filters, making them difficult to interpret. Furthermore, networks trained on small datasets often do not generalize well to new sequences. RESULTS: Here we present circular filters, a novel convolutional architecture, that convolves sequences with circularly permutated variants of the same filter. We motivate circular filters by the observation that CNNs frequently learn filters that correspond to shifted and truncated variants of the true motif. Circular filters enable learning of full-length motifs and allow easy interpretation of the learned filters. We show that circular filters improve motif inference performance over a wide range of hyperparameters as well as sequence length. Furthermore, we show that CNNs with circular filters in most cases outperform conventional CNNs at inferring DNA binding sites from ChIP-seq data. AVAILABILITY AND IMPLEMENTATION: Code is available at https://github.com/christopherblum. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6792110
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-67921102019-10-18 Neural networks with circular filters enable data efficient inference of sequence motifs Blum, Christopher F Kollmann, Markus Bioinformatics Original Papers MOTIVATION: Nucleic acids and proteins often have localized sequence motifs that enable highly specific interactions. Due to the biological relevance of sequence motifs, numerous inference methods have been developed. Recently, convolutional neural networks (CNNs) have achieved state of the art performance. These methods were able to learn transcription factor binding sites from ChIP-seq data, resulting in accurate predictions on test data. However, CNNs typically distribute learned motifs across multiple filters, making them difficult to interpret. Furthermore, networks trained on small datasets often do not generalize well to new sequences. RESULTS: Here we present circular filters, a novel convolutional architecture, that convolves sequences with circularly permutated variants of the same filter. We motivate circular filters by the observation that CNNs frequently learn filters that correspond to shifted and truncated variants of the true motif. Circular filters enable learning of full-length motifs and allow easy interpretation of the learned filters. We show that circular filters improve motif inference performance over a wide range of hyperparameters as well as sequence length. Furthermore, we show that CNNs with circular filters in most cases outperform conventional CNNs at inferring DNA binding sites from ChIP-seq data. AVAILABILITY AND IMPLEMENTATION: Code is available at https://github.com/christopherblum. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-10-15 2019-03-27 /pmc/articles/PMC6792110/ /pubmed/30918943 http://dx.doi.org/10.1093/bioinformatics/btz194 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Blum, Christopher F
Kollmann, Markus
Neural networks with circular filters enable data efficient inference of sequence motifs
title Neural networks with circular filters enable data efficient inference of sequence motifs
title_full Neural networks with circular filters enable data efficient inference of sequence motifs
title_fullStr Neural networks with circular filters enable data efficient inference of sequence motifs
title_full_unstemmed Neural networks with circular filters enable data efficient inference of sequence motifs
title_short Neural networks with circular filters enable data efficient inference of sequence motifs
title_sort neural networks with circular filters enable data efficient inference of sequence motifs
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6792110/
https://www.ncbi.nlm.nih.gov/pubmed/30918943
http://dx.doi.org/10.1093/bioinformatics/btz194
work_keys_str_mv AT blumchristopherf neuralnetworkswithcircularfiltersenabledataefficientinferenceofsequencemotifs
AT kollmannmarkus neuralnetworkswithcircularfiltersenabledataefficientinferenceofsequencemotifs