Cargando…

A journey over the memory managment stack for HPC large applications on moderne architectures

<!--HTML-->Memory managment has always been an issue for large application but the increase of memory space and intra-node thread-based parallelism now put lot more pressure on this complex part of the operating system stack. Althrough there is a long tradition of algorithm developpements on t...

Descripción completa

Detalles Bibliográficos
Autor principal: Valat, Sébastien
Lenguaje:eng
Publicado: 2019
Materias:
Acceso en línea:http://cds.cern.ch/record/2691972
Descripción
Sumario:<!--HTML-->Memory managment has always been an issue for large application but the increase of memory space and intra-node thread-based parallelism now put lot more pressure on this complex part of the operating system stack. Althrough there is a long tradition of algorithm developpements on this topic with behind 60 years of research there is still a lot to do. This is even more true in large scale application where the size of the code (target was a million line C++/MPI app) and global complexity is a big limitation to apply what should theoritically be the clean way to proceed. We also today need to make global optimization to make the wall stack well interacting not letting a component breaking the performance gained by the top or bottom one. After making a PhD. on memory management in HPC mostly arround a malloc implementation and various kernels memory managment studies for supercomputers and NUMA architectures I pursued as a post-doc developping a memory profiling tool: MALT. During my time at CERN I added to the list NUMAPROF a NUMA memory profiling tool. I can over this talk recap the 9 years road I walked on with experience feedback showing sometimes impressive performance gaps on large real applications by considering the path from CPU caches, NUMA layout going through the OS paging system and malloc implementation closing by profiling real applications. I will try to glue the full picture showing the need to keep the global picture to really reach performance.