Cargando…

Explainability in transformer models for functional genomics

The effectiveness of deep learning methods can be largely attributed to the automated extraction of relevant features from raw data. In the field of functional genomics, this generally concerns the automatic selection of relevant nucleotide motifs from DNA sequences. To benefit from automated learni...

Descripción completa

Detalles Bibliográficos
Autores principales: Clauwaert, Jim, Menschaert, Gerben, Waegeman, Willem
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8425421/
https://www.ncbi.nlm.nih.gov/pubmed/33834200
http://dx.doi.org/10.1093/bib/bbab060
_version_ 1783749846592651264
author Clauwaert, Jim
Menschaert, Gerben
Waegeman, Willem
author_facet Clauwaert, Jim
Menschaert, Gerben
Waegeman, Willem
author_sort Clauwaert, Jim
collection PubMed
description The effectiveness of deep learning methods can be largely attributed to the automated extraction of relevant features from raw data. In the field of functional genomics, this generally concerns the automatic selection of relevant nucleotide motifs from DNA sequences. To benefit from automated learning methods, new strategies are required that unveil the decision-making process of trained models. In this paper, we present a new approach that has been successful in gathering insights on the transcription process in Escherichia coli. This work builds upon a transformer-based neural network framework designed for prokaryotic genome annotation purposes. We find that the majority of subunits (attention heads) of the model are specialized towards identifying transcription factors and are able to successfully characterize both their binding sites and consensus sequences, uncovering both well-known and potentially novel elements involved in the initiation of the transcription process. With the specialization of the attention heads occurring automatically, we believe transformer models to be of high interest towards the creation of explainable neural networks in this field.
format Online
Article
Text
id pubmed-8425421
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-84254212021-09-09 Explainability in transformer models for functional genomics Clauwaert, Jim Menschaert, Gerben Waegeman, Willem Brief Bioinform Problem Solving Protocol The effectiveness of deep learning methods can be largely attributed to the automated extraction of relevant features from raw data. In the field of functional genomics, this generally concerns the automatic selection of relevant nucleotide motifs from DNA sequences. To benefit from automated learning methods, new strategies are required that unveil the decision-making process of trained models. In this paper, we present a new approach that has been successful in gathering insights on the transcription process in Escherichia coli. This work builds upon a transformer-based neural network framework designed for prokaryotic genome annotation purposes. We find that the majority of subunits (attention heads) of the model are specialized towards identifying transcription factors and are able to successfully characterize both their binding sites and consensus sequences, uncovering both well-known and potentially novel elements involved in the initiation of the transcription process. With the specialization of the attention heads occurring automatically, we believe transformer models to be of high interest towards the creation of explainable neural networks in this field. Oxford University Press 2021-04-08 /pmc/articles/PMC8425421/ /pubmed/33834200 http://dx.doi.org/10.1093/bib/bbab060 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Clauwaert, Jim
Menschaert, Gerben
Waegeman, Willem
Explainability in transformer models for functional genomics
title Explainability in transformer models for functional genomics
title_full Explainability in transformer models for functional genomics
title_fullStr Explainability in transformer models for functional genomics
title_full_unstemmed Explainability in transformer models for functional genomics
title_short Explainability in transformer models for functional genomics
title_sort explainability in transformer models for functional genomics
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8425421/
https://www.ncbi.nlm.nih.gov/pubmed/33834200
http://dx.doi.org/10.1093/bib/bbab060
work_keys_str_mv AT clauwaertjim explainabilityintransformermodelsforfunctionalgenomics
AT menschaertgerben explainabilityintransformermodelsforfunctionalgenomics
AT waegemanwillem explainabilityintransformermodelsforfunctionalgenomics