Cargando…
Representation learning of genomic sequence motifs with convolutional neural networks
Although convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they build representations of regulatory genomic sequences. Here we perform systematic experiments on synthetic sequences to reveal h...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6941814/ https://www.ncbi.nlm.nih.gov/pubmed/31856220 http://dx.doi.org/10.1371/journal.pcbi.1007560 |
_version_ | 1783484601571737600 |
---|---|
author | Koo, Peter K. Eddy, Sean R. |
author_facet | Koo, Peter K. Eddy, Sean R. |
author_sort | Koo, Peter K. |
collection | PubMed |
description | Although convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they build representations of regulatory genomic sequences. Here we perform systematic experiments on synthetic sequences to reveal how CNN architecture, specifically convolutional filter size and max-pooling, influences the extent that sequence motif representations are learned by first layer filters. We find that CNNs designed to foster hierarchical representation learning of sequence motifs—assembling partial features into whole features in deeper layers—tend to learn distributed representations, i.e. partial motifs. On the other hand, CNNs that are designed to limit the ability to hierarchically build sequence motif representations in deeper layers tend to learn more interpretable localist representations, i.e. whole motifs. We then validate that this representation learning principle established from synthetic sequences generalizes to in vivo sequences. |
format | Online Article Text |
id | pubmed-6941814 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-69418142020-01-10 Representation learning of genomic sequence motifs with convolutional neural networks Koo, Peter K. Eddy, Sean R. PLoS Comput Biol Research Article Although convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they build representations of regulatory genomic sequences. Here we perform systematic experiments on synthetic sequences to reveal how CNN architecture, specifically convolutional filter size and max-pooling, influences the extent that sequence motif representations are learned by first layer filters. We find that CNNs designed to foster hierarchical representation learning of sequence motifs—assembling partial features into whole features in deeper layers—tend to learn distributed representations, i.e. partial motifs. On the other hand, CNNs that are designed to limit the ability to hierarchically build sequence motif representations in deeper layers tend to learn more interpretable localist representations, i.e. whole motifs. We then validate that this representation learning principle established from synthetic sequences generalizes to in vivo sequences. Public Library of Science 2019-12-19 /pmc/articles/PMC6941814/ /pubmed/31856220 http://dx.doi.org/10.1371/journal.pcbi.1007560 Text en © 2019 Koo, Eddy http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Koo, Peter K. Eddy, Sean R. Representation learning of genomic sequence motifs with convolutional neural networks |
title | Representation learning of genomic sequence motifs with convolutional neural networks |
title_full | Representation learning of genomic sequence motifs with convolutional neural networks |
title_fullStr | Representation learning of genomic sequence motifs with convolutional neural networks |
title_full_unstemmed | Representation learning of genomic sequence motifs with convolutional neural networks |
title_short | Representation learning of genomic sequence motifs with convolutional neural networks |
title_sort | representation learning of genomic sequence motifs with convolutional neural networks |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6941814/ https://www.ncbi.nlm.nih.gov/pubmed/31856220 http://dx.doi.org/10.1371/journal.pcbi.1007560 |
work_keys_str_mv | AT koopeterk representationlearningofgenomicsequencemotifswithconvolutionalneuralnetworks AT eddyseanr representationlearningofgenomicsequencemotifswithconvolutionalneuralnetworks |