Cargando…

EmbedFormer: Embedded Depth-Wise Convolution Layer for Token Mixing

Visual Transformers (ViTs) have shown impressive performance due to their powerful coding ability to catch spatial and channel information. MetaFormer gives us a general architecture of transformers consisting of a token mixer and a channel mixer through which we can generally understand how transfo...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Zeji, He, Xiaowei, Li, Yi, Chuai, Qinliang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9782848/
https://www.ncbi.nlm.nih.gov/pubmed/36560222
http://dx.doi.org/10.3390/s22249854