Cargando…

Methods for identifying emergent concepts in deep neural networks

The present perspective discusses methods to detect concepts in internal representations (hidden layers) of deep neural networks (DNNs), such as network dissection, feature visualization, and testing with concept activation vectors (TCAV). I argue that these methods provide evidence that DNNs are ab...

Descripción completa

Detalles Bibliográficos
Autor principal: Räz, Tim
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10318355/
https://www.ncbi.nlm.nih.gov/pubmed/37409048
http://dx.doi.org/10.1016/j.patter.2023.100761
Descripción
Sumario:The present perspective discusses methods to detect concepts in internal representations (hidden layers) of deep neural networks (DNNs), such as network dissection, feature visualization, and testing with concept activation vectors (TCAV). I argue that these methods provide evidence that DNNs are able to learn non-trivial relations between concepts. However, the methods also require users to specify or detect concepts via (sets of) instances. This underdetermines the meaning of concepts, making the methods unreliable. The problem could be overcome, to some extent, by systematically combining the methods and by using synthetic datasets. The perspective also discusses how conceptual spaces—sets of concepts in internal representations—are shaped by a trade-off between predictive accuracy and compression. I argue that conceptual spaces are useful, or even necessary, to understand how concepts are formed in DNNs but that there is a lack of method for studying conceptual spaces.