Cargando…

Pattern-Aware Staging for Hybrid Memory Systems

The ever increasing demand for higher memory performance and—at the same time—larger memory capacity is leading the industry towards hybrid main memory designs, i.e., memory systems that consist of multiple different memory technologies. This trend, however, naturally leads to one important question...

Descripción completa

Detalles Bibliográficos
Autores principales: Arima, Eishi, Schulz, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295340/
http://dx.doi.org/10.1007/978-3-030-50743-5_24
Descripción
Sumario:The ever increasing demand for higher memory performance and—at the same time—larger memory capacity is leading the industry towards hybrid main memory designs, i.e., memory systems that consist of multiple different memory technologies. This trend, however, naturally leads to one important question: how can we efficiently utilize such hybrid memories? Our paper proposes a software-based approach to solve this challenge by deploying a pattern-aware staging technique. Our work is based on the following observations: (a) the high-bandwidth fast memory outperforms the large memory for memory intensive tasks; (b) but those tasks can run for much longer than a bulk data copy to/from the fast memory, especially when the access pattern is more irregular/sparse. We exploit these observations by applying the following staging technique if the accesses are irregular and sparse: (1) copying a chunk (few GB of sequential data) from large to fast memory; (2) performing a memory intensive task on the chunk; and (3) writing it back to the large memory. To check the regularity/sparseness of the accesses at runtime with negligible performance impact, we develop a lightweight pattern detection mechanism using a helper threading inspired approach with two different Bloom filters. Our case study using various scientific codes on a real system shows that our approach achieves significant speed-ups compared to executions with using only the large memory or hardware caching: 3[Formula: see text] or 41% speedups in the best, respectively.