Cargando…

DGEMM Using Tensor Cores, and Its Accurate and Reproducible Versions

This paper proposes a method for implementing dense matrix multiplication on FP64 (DGEMM) and FP32 (SGEMM) using Tensor Cores on NVIDIA’s graphics processing units (GPUs). Tensor Cores are special processing units that perform [Formula: see text] matrix multiplications on FP16 inputs with FP32 preci...

Descripción completa

Detalles Bibliográficos
Autores principales: Mukunoki, Daichi, Ozaki, Katsuhisa, Ogita, Takeshi, Imamura, Toshiyuki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295351/
http://dx.doi.org/10.1007/978-3-030-50743-5_12