Cargando…

Hierarchical cycle accounting: a new method for application performance tuning

To address the growing difficulty of performance debugging on modern processors with increasingly complex micro-architectures, we present Hierarchical Cycle Accounting (HCA), a structured, hierarchical, architecture-agnostic methodology for the identification of performance issues in workloads runni...

Descripción completa

Detalles Bibliográficos
Autores principales: Nowak, Andrzej, Levinthal, David, Zwaenepoel, Willy
Lenguaje:eng
Publicado: 2015
Materias:
Acceso en línea:https://dx.doi.org/10.1109/ISPASS.2015.7095790
http://cds.cern.ch/record/2311935
_version_ 1780957961444655104
author Nowak, Andrzej
Levinthal, David
Zwaenepoel, Willy
author_facet Nowak, Andrzej
Levinthal, David
Zwaenepoel, Willy
author_sort Nowak, Andrzej
collection CERN
description To address the growing difficulty of performance debugging on modern processors with increasingly complex micro-architectures, we present Hierarchical Cycle Accounting (HCA), a structured, hierarchical, architecture-agnostic methodology for the identification of performance issues in workloads running on these modern processors. HCA reports to the user the cost of a number of execution components, such as load latency, memory bandwidth, instruction starvation, and branch misprediction. A critical novel feature of HCA is that all cost components are presented in the same unit, core pipeline cycles. Their relative importance can therefore be compared directly. These cost components are furthermore presented in a hierarchical fashion, with architecture-agnostic components at the top levels of the hierarchy and architecture-specific components at the bottom. This hierarchical structure is useful in guiding the performance debugging effort to the places where it can be the most effective. For a given architecture, the cost components are computed based on the observation of architecture-specific events, typically provided by a performance monitoring unit (PMU), and using a set of formulas to attribute a certain cost in cycles to each event. The selection of what PMU events to use, their validation, and the derivation of the formulas are done offline by an architecture expert, thereby freeing the non-expert from the burdensome and error-prone task of directly interpreting PMU data. We have implemented the HCA methodology in Gooda, a publicly available tool. We describe the application of Gooda to the analysis of several workloads in wide use, showing how HCA's features facilitated performance debugging for these applications. We also describe the discovery of relevant bugs in Intel hardware and the Linux Kernel as a result of using HCA.
id oai-inspirehep.net-1665982
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2015
record_format invenio
spelling oai-inspirehep.net-16659822022-01-19T13:38:42Zdoi:10.1109/ISPASS.2015.7095790http://cds.cern.ch/record/2311935engNowak, AndrzejLevinthal, DavidZwaenepoel, WillyHierarchical cycle accounting: a new method for application performance tuningComputing and ComputersTo address the growing difficulty of performance debugging on modern processors with increasingly complex micro-architectures, we present Hierarchical Cycle Accounting (HCA), a structured, hierarchical, architecture-agnostic methodology for the identification of performance issues in workloads running on these modern processors. HCA reports to the user the cost of a number of execution components, such as load latency, memory bandwidth, instruction starvation, and branch misprediction. A critical novel feature of HCA is that all cost components are presented in the same unit, core pipeline cycles. Their relative importance can therefore be compared directly. These cost components are furthermore presented in a hierarchical fashion, with architecture-agnostic components at the top levels of the hierarchy and architecture-specific components at the bottom. This hierarchical structure is useful in guiding the performance debugging effort to the places where it can be the most effective. For a given architecture, the cost components are computed based on the observation of architecture-specific events, typically provided by a performance monitoring unit (PMU), and using a set of formulas to attribute a certain cost in cycles to each event. The selection of what PMU events to use, their validation, and the derivation of the formulas are done offline by an architecture expert, thereby freeing the non-expert from the burdensome and error-prone task of directly interpreting PMU data. We have implemented the HCA methodology in Gooda, a publicly available tool. We describe the application of Gooda to the analysis of several workloads in wide use, showing how HCA's features facilitated performance debugging for these applications. We also describe the discovery of relevant bugs in Intel hardware and the Linux Kernel as a result of using HCA.oai:inspirehep.net:16659822015
spellingShingle Computing and Computers
Nowak, Andrzej
Levinthal, David
Zwaenepoel, Willy
Hierarchical cycle accounting: a new method for application performance tuning
title Hierarchical cycle accounting: a new method for application performance tuning
title_full Hierarchical cycle accounting: a new method for application performance tuning
title_fullStr Hierarchical cycle accounting: a new method for application performance tuning
title_full_unstemmed Hierarchical cycle accounting: a new method for application performance tuning
title_short Hierarchical cycle accounting: a new method for application performance tuning
title_sort hierarchical cycle accounting: a new method for application performance tuning
topic Computing and Computers
url https://dx.doi.org/10.1109/ISPASS.2015.7095790
http://cds.cern.ch/record/2311935
work_keys_str_mv AT nowakandrzej hierarchicalcycleaccountinganewmethodforapplicationperformancetuning
AT levinthaldavid hierarchicalcycleaccountinganewmethodforapplicationperformancetuning
AT zwaenepoelwilly hierarchicalcycleaccountinganewmethodforapplicationperformancetuning