Cargando…

Accelerated linear algebra compiler for computationally efficient numerical models: Success and potential area of improvement

The recent dramatic progress in machine learning is partially attributed to the availability of high-performant computers and development tools. The accelerated linear algebra (XLA) compiler is one such tool that automatically optimises array operations (mostly fusion to reduce memory operations) an...

Descripción completa

Detalles Bibliográficos
Autor principal:	He, Xuzhen
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9956871/ https://www.ncbi.nlm.nih.gov/pubmed/36827434 http://dx.doi.org/10.1371/journal.pone.0282265

_version_	1784894686722260992
author	He, Xuzhen
author_facet	He, Xuzhen
author_sort	He, Xuzhen
collection	PubMed
description	The recent dramatic progress in machine learning is partially attributed to the availability of high-performant computers and development tools. The accelerated linear algebra (XLA) compiler is one such tool that automatically optimises array operations (mostly fusion to reduce memory operations) and compiles the optimised operations into high-performant programs specific to target computing platforms. Like machine-learning models, numerical models are often expressed in array operations, and thus their performance can be boosted by XLA. This study is the first of its kind to examine the efficiency of XLA for numerical models, and the efficiency is examined stringently by comparing its performance with that of optimal implementations. Two shared-memory computing platforms are examined–the CPU platform and the GPU platform. To obtain optimal implementations, the computing speed and its optimisation are rigorously studied by considering different workloads and the corresponding computer performance. Two simple equations are found to faithfully modell the computing speed of numerical models with very few easily-measureable parameters. Regarding operation optimisation within XLA, results show that models expressed in low-level operations (e.g., slice, concatenation, and arithmetic operations) are successfully fused while high-level operations (e.g., convolution and roll) are not. Regarding compilation within XLA, results show that for the CPU platform of certain computers and certain simple numerical models on the GPU platform, XLA achieves high efficiency (> 80%) for large problems and acceptable efficiency (10%~80%) for medium-size problems–the gap is from the overhead cost of Python. Unsatisfactory performance is found for the CPU platform of other computers (operations are compiled in a non-optimal way) and for high-dimensional complex models for the GPU platform, where each GPU thread in XLA handles 4 (single precision) or 2 (double precision) output elements–hoping to exploit the high-performant instructions that can read/write 4 or 2 floating-point numbers with one instruction. However, these instructions are rarely used in the generated code for complex models and performance is negatively affected. Therefore, flags should be added to control the compilation for these non-optimal scenarios.
format	Online Article Text
id	pubmed-9956871
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-99568712023-02-25 Accelerated linear algebra compiler for computationally efficient numerical models: Success and potential area of improvement He, Xuzhen PLoS One Research Article The recent dramatic progress in machine learning is partially attributed to the availability of high-performant computers and development tools. The accelerated linear algebra (XLA) compiler is one such tool that automatically optimises array operations (mostly fusion to reduce memory operations) and compiles the optimised operations into high-performant programs specific to target computing platforms. Like machine-learning models, numerical models are often expressed in array operations, and thus their performance can be boosted by XLA. This study is the first of its kind to examine the efficiency of XLA for numerical models, and the efficiency is examined stringently by comparing its performance with that of optimal implementations. Two shared-memory computing platforms are examined–the CPU platform and the GPU platform. To obtain optimal implementations, the computing speed and its optimisation are rigorously studied by considering different workloads and the corresponding computer performance. Two simple equations are found to faithfully modell the computing speed of numerical models with very few easily-measureable parameters. Regarding operation optimisation within XLA, results show that models expressed in low-level operations (e.g., slice, concatenation, and arithmetic operations) are successfully fused while high-level operations (e.g., convolution and roll) are not. Regarding compilation within XLA, results show that for the CPU platform of certain computers and certain simple numerical models on the GPU platform, XLA achieves high efficiency (> 80%) for large problems and acceptable efficiency (10%~80%) for medium-size problems–the gap is from the overhead cost of Python. Unsatisfactory performance is found for the CPU platform of other computers (operations are compiled in a non-optimal way) and for high-dimensional complex models for the GPU platform, where each GPU thread in XLA handles 4 (single precision) or 2 (double precision) output elements–hoping to exploit the high-performant instructions that can read/write 4 or 2 floating-point numbers with one instruction. However, these instructions are rarely used in the generated code for complex models and performance is negatively affected. Therefore, flags should be added to control the compilation for these non-optimal scenarios. Public Library of Science 2023-02-24 /pmc/articles/PMC9956871/ /pubmed/36827434 http://dx.doi.org/10.1371/journal.pone.0282265 Text en © 2023 Xuzhen He https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article He, Xuzhen Accelerated linear algebra compiler for computationally efficient numerical models: Success and potential area of improvement
title	Accelerated linear algebra compiler for computationally efficient numerical models: Success and potential area of improvement
title_full	Accelerated linear algebra compiler for computationally efficient numerical models: Success and potential area of improvement
title_fullStr	Accelerated linear algebra compiler for computationally efficient numerical models: Success and potential area of improvement
title_full_unstemmed	Accelerated linear algebra compiler for computationally efficient numerical models: Success and potential area of improvement
title_short	Accelerated linear algebra compiler for computationally efficient numerical models: Success and potential area of improvement
title_sort	accelerated linear algebra compiler for computationally efficient numerical models: success and potential area of improvement
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9956871/ https://www.ncbi.nlm.nih.gov/pubmed/36827434 http://dx.doi.org/10.1371/journal.pone.0282265
work_keys_str_mv	AT hexuzhen acceleratedlinearalgebracompilerforcomputationallyefficientnumericalmodelssuccessandpotentialareaofimprovement

Accelerated linear algebra compiler for computationally efficient numerical models: Success and potential area of improvement

Ejemplares similares