Summary
Performance Monitoring Unit (PMU) counters provide crucial insights into microarchitectural events within CPUs, such as executed instructions and cache misses. This exploration delves into creating a tool for fetching these counters on Apple Silicon chips, like the M1 and M2, revealing a complex landscape of fixed and programmable counters. The journey uncovers limitations within tools like Apple Instruments, which struggles with fetching more than a handful of counters simultaneously due to inherent incompatibilities among them. This research not only highlights the challenges of working with undocumented APIs but also the nuances of counter compatibility and the importance of the order in which counters are added.
Highlights:
- Development of a custom tool to fetch PMU counters on Apple Silicon, highlighting the limitations of existing tools.
- Discovery of inherent incompatibilities among PMU counters when fetched simultaneously, using Apple Instruments and custom tools.
- Reverse-engineering efforts reveal undocumented behaviors of the kperf API on macOS, vital for performance monitoring.
- Significant findings on counter incompatibilities leading to a better understanding of microarchitectural event tracking.
- Practical insights into the ordering of counter additions, which significantly affects compatibility and functionality.
PMU counters on Apple Silicon, such as M1 and M2, are crucial for developers to understand CPU performance through tracking of microarchitectural events like cache misses and instruction execution. The exploration into PMU counters began with limitations encountered in Apple Instruments, which only supports up to 10 counters, leading to the development of a custom tool to fetch all available counters. This tool leveraged the undocumented kperf API, necessitating reverse-engineering to understand and utilize it effectively.
The research revealed a complex picture of counter compatibility; certain counters when added together resulted in errors due to overlapping mask values, which represent the internal handling of counter data within the CPU architecture. This was particularly evident in experiments that involved adding counters in various combinations, where even small changes in the order of addition could prevent or cause errors. The findings highlighted not only specific incompatibilities but also the broader issue of undocumented limitations and behaviors in Apple's performance monitoring framework.
Ultimately, the tool developed, named 'Lauka', incorporated findings from the reverse-engineered kperf code and provided a more flexible and informative approach to monitoring PMU counters on Apple Silicon. The tool's capability to select and order counters strategically circumvents some of the limitations found in Apple Instruments, offering a deeper insight into CPU performance and optimization. This journey underscored the challenges and complexities of working with proprietary systems and the importance of community-driven research and tool development in uncovering and understanding these systems.