Cargando…

Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16

Most Earth‐system simulations run on conventional central processing units in 64‐bit double precision floating‐point numbers Float64, although the need for high‐precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer...

Descripción completa

Detalles Bibliográficos
Autores principales: Klöwer, Milan, Hatfield, Sam, Croci, Matteo, Düben, Peter D., Palmer, Tim N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9287017/
https://www.ncbi.nlm.nih.gov/pubmed/35866041
http://dx.doi.org/10.1029/2021MS002684
_version_ 1784748154576437248
author Klöwer, Milan
Hatfield, Sam
Croci, Matteo
Düben, Peter D.
Palmer, Tim N.
author_facet Klöwer, Milan
Hatfield, Sam
Croci, Matteo
Düben, Peter D.
Palmer, Tim N.
author_sort Klöwer, Milan
collection PubMed
description Most Earth‐system simulations run on conventional central processing units in 64‐bit double precision floating‐point numbers Float64, although the need for high‐precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer, is based on A64FX microprocessors, which also support the 16‐bit low‐precision format Float16. We investigate the Float16 performance on A64FX with ShallowWaters.jl, the first fluid circulation model that runs entirely with 16‐bit arithmetic. The model implements techniques that address precision and dynamic range issues in 16 bits. The precision‐critical time integration is augmented to include compensated summation to minimize rounding errors. Such a compensated time integration is as precise but faster than mixed precision with 16 and 32‐bit floats. As subnormals are inefficiently supported on A64FX the very limited range available in Float16 is 6 × 10(−5) to 65,504. We develop the analysis‐number format Sherlogs.jl to log the arithmetic results during the simulation. The equations in ShallowWaters.jl are then systematically rescaled to fit into Float16, using 97% of the available representable numbers. Consequently, we benchmark speedups of up to 3.8x on A64FX with Float16. Adding a compensated time integration, speedups reach up to 3.6x. Although ShallowWaters.jl is simplified compared to large Earth‐system models, it shares essential algorithms and therefore shows that 16‐bit calculations are indeed a competitive way to accelerate Earth‐system simulations on available hardware.
format Online
Article
Text
id pubmed-9287017
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-92870172022-07-19 Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16 Klöwer, Milan Hatfield, Sam Croci, Matteo Düben, Peter D. Palmer, Tim N. J Adv Model Earth Syst Research Article Most Earth‐system simulations run on conventional central processing units in 64‐bit double precision floating‐point numbers Float64, although the need for high‐precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer, is based on A64FX microprocessors, which also support the 16‐bit low‐precision format Float16. We investigate the Float16 performance on A64FX with ShallowWaters.jl, the first fluid circulation model that runs entirely with 16‐bit arithmetic. The model implements techniques that address precision and dynamic range issues in 16 bits. The precision‐critical time integration is augmented to include compensated summation to minimize rounding errors. Such a compensated time integration is as precise but faster than mixed precision with 16 and 32‐bit floats. As subnormals are inefficiently supported on A64FX the very limited range available in Float16 is 6 × 10(−5) to 65,504. We develop the analysis‐number format Sherlogs.jl to log the arithmetic results during the simulation. The equations in ShallowWaters.jl are then systematically rescaled to fit into Float16, using 97% of the available representable numbers. Consequently, we benchmark speedups of up to 3.8x on A64FX with Float16. Adding a compensated time integration, speedups reach up to 3.6x. Although ShallowWaters.jl is simplified compared to large Earth‐system models, it shares essential algorithms and therefore shows that 16‐bit calculations are indeed a competitive way to accelerate Earth‐system simulations on available hardware. John Wiley and Sons Inc. 2022-02-11 2022-02 /pmc/articles/PMC9287017/ /pubmed/35866041 http://dx.doi.org/10.1029/2021MS002684 Text en © 2022 The Authors. Journal of Advances in Modeling Earth Systems published by Wiley Periodicals LLC on behalf of American Geophysical Union. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Klöwer, Milan
Hatfield, Sam
Croci, Matteo
Düben, Peter D.
Palmer, Tim N.
Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16
title Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16
title_full Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16
title_fullStr Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16
title_full_unstemmed Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16
title_short Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16
title_sort fluid simulations accelerated with 16 bits: approaching 4x speedup on a64fx by squeezing shallowwaters.jl into float16
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9287017/
https://www.ncbi.nlm.nih.gov/pubmed/35866041
http://dx.doi.org/10.1029/2021MS002684
work_keys_str_mv AT klowermilan fluidsimulationsacceleratedwith16bitsapproaching4xspeedupona64fxbysqueezingshallowwatersjlintofloat16
AT hatfieldsam fluidsimulationsacceleratedwith16bitsapproaching4xspeedupona64fxbysqueezingshallowwatersjlintofloat16
AT crocimatteo fluidsimulationsacceleratedwith16bitsapproaching4xspeedupona64fxbysqueezingshallowwatersjlintofloat16
AT dubenpeterd fluidsimulationsacceleratedwith16bitsapproaching4xspeedupona64fxbysqueezingshallowwatersjlintofloat16
AT palmertimn fluidsimulationsacceleratedwith16bitsapproaching4xspeedupona64fxbysqueezingshallowwatersjlintofloat16