Cargando…
Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16
Most Earth‐system simulations run on conventional central processing units in 64‐bit double precision floating‐point numbers Float64, although the need for high‐precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9287017/ https://www.ncbi.nlm.nih.gov/pubmed/35866041 http://dx.doi.org/10.1029/2021MS002684 |
_version_ | 1784748154576437248 |
---|---|
author | Klöwer, Milan Hatfield, Sam Croci, Matteo Düben, Peter D. Palmer, Tim N. |
author_facet | Klöwer, Milan Hatfield, Sam Croci, Matteo Düben, Peter D. Palmer, Tim N. |
author_sort | Klöwer, Milan |
collection | PubMed |
description | Most Earth‐system simulations run on conventional central processing units in 64‐bit double precision floating‐point numbers Float64, although the need for high‐precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer, is based on A64FX microprocessors, which also support the 16‐bit low‐precision format Float16. We investigate the Float16 performance on A64FX with ShallowWaters.jl, the first fluid circulation model that runs entirely with 16‐bit arithmetic. The model implements techniques that address precision and dynamic range issues in 16 bits. The precision‐critical time integration is augmented to include compensated summation to minimize rounding errors. Such a compensated time integration is as precise but faster than mixed precision with 16 and 32‐bit floats. As subnormals are inefficiently supported on A64FX the very limited range available in Float16 is 6 × 10(−5) to 65,504. We develop the analysis‐number format Sherlogs.jl to log the arithmetic results during the simulation. The equations in ShallowWaters.jl are then systematically rescaled to fit into Float16, using 97% of the available representable numbers. Consequently, we benchmark speedups of up to 3.8x on A64FX with Float16. Adding a compensated time integration, speedups reach up to 3.6x. Although ShallowWaters.jl is simplified compared to large Earth‐system models, it shares essential algorithms and therefore shows that 16‐bit calculations are indeed a competitive way to accelerate Earth‐system simulations on available hardware. |
format | Online Article Text |
id | pubmed-9287017 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-92870172022-07-19 Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16 Klöwer, Milan Hatfield, Sam Croci, Matteo Düben, Peter D. Palmer, Tim N. J Adv Model Earth Syst Research Article Most Earth‐system simulations run on conventional central processing units in 64‐bit double precision floating‐point numbers Float64, although the need for high‐precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer, is based on A64FX microprocessors, which also support the 16‐bit low‐precision format Float16. We investigate the Float16 performance on A64FX with ShallowWaters.jl, the first fluid circulation model that runs entirely with 16‐bit arithmetic. The model implements techniques that address precision and dynamic range issues in 16 bits. The precision‐critical time integration is augmented to include compensated summation to minimize rounding errors. Such a compensated time integration is as precise but faster than mixed precision with 16 and 32‐bit floats. As subnormals are inefficiently supported on A64FX the very limited range available in Float16 is 6 × 10(−5) to 65,504. We develop the analysis‐number format Sherlogs.jl to log the arithmetic results during the simulation. The equations in ShallowWaters.jl are then systematically rescaled to fit into Float16, using 97% of the available representable numbers. Consequently, we benchmark speedups of up to 3.8x on A64FX with Float16. Adding a compensated time integration, speedups reach up to 3.6x. Although ShallowWaters.jl is simplified compared to large Earth‐system models, it shares essential algorithms and therefore shows that 16‐bit calculations are indeed a competitive way to accelerate Earth‐system simulations on available hardware. John Wiley and Sons Inc. 2022-02-11 2022-02 /pmc/articles/PMC9287017/ /pubmed/35866041 http://dx.doi.org/10.1029/2021MS002684 Text en © 2022 The Authors. Journal of Advances in Modeling Earth Systems published by Wiley Periodicals LLC on behalf of American Geophysical Union. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Klöwer, Milan Hatfield, Sam Croci, Matteo Düben, Peter D. Palmer, Tim N. Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16 |
title | Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16 |
title_full | Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16 |
title_fullStr | Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16 |
title_full_unstemmed | Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16 |
title_short | Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16 |
title_sort | fluid simulations accelerated with 16 bits: approaching 4x speedup on a64fx by squeezing shallowwaters.jl into float16 |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9287017/ https://www.ncbi.nlm.nih.gov/pubmed/35866041 http://dx.doi.org/10.1029/2021MS002684 |
work_keys_str_mv | AT klowermilan fluidsimulationsacceleratedwith16bitsapproaching4xspeedupona64fxbysqueezingshallowwatersjlintofloat16 AT hatfieldsam fluidsimulationsacceleratedwith16bitsapproaching4xspeedupona64fxbysqueezingshallowwatersjlintofloat16 AT crocimatteo fluidsimulationsacceleratedwith16bitsapproaching4xspeedupona64fxbysqueezingshallowwatersjlintofloat16 AT dubenpeterd fluidsimulationsacceleratedwith16bitsapproaching4xspeedupona64fxbysqueezingshallowwatersjlintofloat16 AT palmertimn fluidsimulationsacceleratedwith16bitsapproaching4xspeedupona64fxbysqueezingshallowwatersjlintofloat16 |