Cargando…

Cholesky factorization on SIMD multi-core architectures

Many linear algebra libraries, such as the Intel MKL, Magma or Eigen, provide fast Cholesky factorization. These libraries are suited for big matrices but perform slowly on small ones. Even though State-of-the-Art studies begin to take an interest in small matrices, they usually feature a few hundre...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lemaitre, Florian, Couturier, Benjamin, Lacassagne, Lionel
Lenguaje:	eng
Publicado:	2017
Materias:	Computing and Computers
Acceso en línea:	https://dx.doi.org/10.1016/j.sysarc.2017.06.005 http://cds.cern.ch/record/2319798

_version_	1780958456353652736
author	Lemaitre, Florian Couturier, Benjamin Lacassagne, Lionel
author_facet	Lemaitre, Florian Couturier, Benjamin Lacassagne, Lionel
author_sort	Lemaitre, Florian
collection	CERN
description	Many linear algebra libraries, such as the Intel MKL, Magma or Eigen, provide fast Cholesky factorization. These libraries are suited for big matrices but perform slowly on small ones. Even though State-of-the-Art studies begin to take an interest in small matrices, they usually feature a few hundreds rows. Fields like Computer Vision or High Energy Physics use tiny matrices. In this paper we show that it is possible to speed up the Cholesky factorization for tiny matrices by grouping them in batches and using highly specialized code. We provide High Level Transformations that accelerate the factorization for current multi-core and many-core SIMD architectures (SSE, AVX2, KNC, AVX512, Neon, Altivec). We focus on the fact that, on some architectures, compilers are unable to vectorize and on other architectures, vectorizing compilers are not efficient. Thus hand-made SIMDization is mandatory. We achieve with these transformations combined with SIMD a speedup from × 14 to × 28 for the whole resolution in single precision compared to the naive code on a AVX2 machine and a speedup from × 6 to × 14 on double precision, both with a strong scalability.
id	oai-inspirehep.net-1673798
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2017
record_format	invenio
spelling	oai-inspirehep.net-16737982019-09-30T06:29:59Zdoi:10.1016/j.sysarc.2017.06.005http://cds.cern.ch/record/2319798engLemaitre, FlorianCouturier, BenjaminLacassagne, LionelCholesky factorization on SIMD multi-core architecturesComputing and ComputersMany linear algebra libraries, such as the Intel MKL, Magma or Eigen, provide fast Cholesky factorization. These libraries are suited for big matrices but perform slowly on small ones. Even though State-of-the-Art studies begin to take an interest in small matrices, they usually feature a few hundreds rows. Fields like Computer Vision or High Energy Physics use tiny matrices. In this paper we show that it is possible to speed up the Cholesky factorization for tiny matrices by grouping them in batches and using highly specialized code. We provide High Level Transformations that accelerate the factorization for current multi-core and many-core SIMD architectures (SSE, AVX2, KNC, AVX512, Neon, Altivec). We focus on the fact that, on some architectures, compilers are unable to vectorize and on other architectures, vectorizing compilers are not efficient. Thus hand-made SIMDization is mandatory. We achieve with these transformations combined with SIMD a speedup from × 14 to × 28 for the whole resolution in single precision compared to the naive code on a AVX2 machine and a speedup from × 6 to × 14 on double precision, both with a strong scalability.oai:inspirehep.net:16737982017
spellingShingle	Computing and Computers Lemaitre, Florian Couturier, Benjamin Lacassagne, Lionel Cholesky factorization on SIMD multi-core architectures
title	Cholesky factorization on SIMD multi-core architectures
title_full	Cholesky factorization on SIMD multi-core architectures
title_fullStr	Cholesky factorization on SIMD multi-core architectures
title_full_unstemmed	Cholesky factorization on SIMD multi-core architectures
title_short	Cholesky factorization on SIMD multi-core architectures
title_sort	cholesky factorization on simd multi-core architectures
topic	Computing and Computers
url	https://dx.doi.org/10.1016/j.sysarc.2017.06.005 http://cds.cern.ch/record/2319798
work_keys_str_mv	AT lemaitreflorian choleskyfactorizationonsimdmulticorearchitectures AT couturierbenjamin choleskyfactorizationonsimdmulticorearchitectures AT lacassagnelionel choleskyfactorizationonsimdmulticorearchitectures

Cholesky factorization on SIMD multi-core architectures

Ejemplares similares