Cargando…

Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16

Most Earth‐system simulations run on conventional central processing units in 64‐bit double precision floating‐point numbers Float64, although the need for high‐precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer...

Descripción completa

Detalles Bibliográficos
Autores principales: Klöwer, Milan, Hatfield, Sam, Croci, Matteo, Düben, Peter D., Palmer, Tim N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9287017/
https://www.ncbi.nlm.nih.gov/pubmed/35866041
http://dx.doi.org/10.1029/2021MS002684
Descripción
Sumario:Most Earth‐system simulations run on conventional central processing units in 64‐bit double precision floating‐point numbers Float64, although the need for high‐precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer, is based on A64FX microprocessors, which also support the 16‐bit low‐precision format Float16. We investigate the Float16 performance on A64FX with ShallowWaters.jl, the first fluid circulation model that runs entirely with 16‐bit arithmetic. The model implements techniques that address precision and dynamic range issues in 16 bits. The precision‐critical time integration is augmented to include compensated summation to minimize rounding errors. Such a compensated time integration is as precise but faster than mixed precision with 16 and 32‐bit floats. As subnormals are inefficiently supported on A64FX the very limited range available in Float16 is 6 × 10(−5) to 65,504. We develop the analysis‐number format Sherlogs.jl to log the arithmetic results during the simulation. The equations in ShallowWaters.jl are then systematically rescaled to fit into Float16, using 97% of the available representable numbers. Consequently, we benchmark speedups of up to 3.8x on A64FX with Float16. Adding a compensated time integration, speedups reach up to 3.6x. Although ShallowWaters.jl is simplified compared to large Earth‐system models, it shares essential algorithms and therefore shows that 16‐bit calculations are indeed a competitive way to accelerate Earth‐system simulations on available hardware.