Cargando…

Representation learning of genomic sequence motifs with convolutional neural networks

Although convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they build representations of regulatory genomic sequences. Here we perform systematic experiments on synthetic sequences to reveal h...

Descripción completa

Detalles Bibliográficos
Autores principales: Koo, Peter K., Eddy, Sean R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6941814/
https://www.ncbi.nlm.nih.gov/pubmed/31856220
http://dx.doi.org/10.1371/journal.pcbi.1007560
_version_ 1783484601571737600
author Koo, Peter K.
Eddy, Sean R.
author_facet Koo, Peter K.
Eddy, Sean R.
author_sort Koo, Peter K.
collection PubMed
description Although convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they build representations of regulatory genomic sequences. Here we perform systematic experiments on synthetic sequences to reveal how CNN architecture, specifically convolutional filter size and max-pooling, influences the extent that sequence motif representations are learned by first layer filters. We find that CNNs designed to foster hierarchical representation learning of sequence motifs—assembling partial features into whole features in deeper layers—tend to learn distributed representations, i.e. partial motifs. On the other hand, CNNs that are designed to limit the ability to hierarchically build sequence motif representations in deeper layers tend to learn more interpretable localist representations, i.e. whole motifs. We then validate that this representation learning principle established from synthetic sequences generalizes to in vivo sequences.
format Online
Article
Text
id pubmed-6941814
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-69418142020-01-10 Representation learning of genomic sequence motifs with convolutional neural networks Koo, Peter K. Eddy, Sean R. PLoS Comput Biol Research Article Although convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they build representations of regulatory genomic sequences. Here we perform systematic experiments on synthetic sequences to reveal how CNN architecture, specifically convolutional filter size and max-pooling, influences the extent that sequence motif representations are learned by first layer filters. We find that CNNs designed to foster hierarchical representation learning of sequence motifs—assembling partial features into whole features in deeper layers—tend to learn distributed representations, i.e. partial motifs. On the other hand, CNNs that are designed to limit the ability to hierarchically build sequence motif representations in deeper layers tend to learn more interpretable localist representations, i.e. whole motifs. We then validate that this representation learning principle established from synthetic sequences generalizes to in vivo sequences. Public Library of Science 2019-12-19 /pmc/articles/PMC6941814/ /pubmed/31856220 http://dx.doi.org/10.1371/journal.pcbi.1007560 Text en © 2019 Koo, Eddy http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Koo, Peter K.
Eddy, Sean R.
Representation learning of genomic sequence motifs with convolutional neural networks
title Representation learning of genomic sequence motifs with convolutional neural networks
title_full Representation learning of genomic sequence motifs with convolutional neural networks
title_fullStr Representation learning of genomic sequence motifs with convolutional neural networks
title_full_unstemmed Representation learning of genomic sequence motifs with convolutional neural networks
title_short Representation learning of genomic sequence motifs with convolutional neural networks
title_sort representation learning of genomic sequence motifs with convolutional neural networks
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6941814/
https://www.ncbi.nlm.nih.gov/pubmed/31856220
http://dx.doi.org/10.1371/journal.pcbi.1007560
work_keys_str_mv AT koopeterk representationlearningofgenomicsequencemotifswithconvolutionalneuralnetworks
AT eddyseanr representationlearningofgenomicsequencemotifswithconvolutionalneuralnetworks