Skip to content

Introduce Loader interface to support alternative event and metric initialization methods #433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 83 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
1246248
refactor metric and event loading
harp-intel Jul 12, 2025
919fdc3
support for dynamic loader (incomplete)
harp-intel Jul 18, 2025
6ff2341
replace metric aliases
harp-intel Jul 18, 2025
2bca9e1
refactor metric and event loading
harp-intel Jul 12, 2025
d1f8ffa
support for dynamic loader (incomplete)
harp-intel Jul 18, 2025
1af1dd1
replace metric aliases
harp-intel Jul 18, 2025
9cbd1ec
Merge branch 'loader' of github.com:intel/PerfSpect into loader
harp-intel Jul 18, 2025
006921e
other events and more
harp-intel Jul 21, 2025
d1bd224
enhance getExpression to replace fixed counter event names with regex…
harp-intel Jul 21, 2025
833624a
refactor: update UniqueID format for uncore events to use 'UNC_' prefix
harp-intel Jul 21, 2025
05799f3
refactor: update metric lookup to use LegacyName instead of MetricName
harp-intel Jul 21, 2025
717a0be
enhance: add function to remove 'metric_' prefix from metric names
harp-intel Jul 21, 2025
7003a8c
refactor: streamline metric configuration by consolidating constant r…
harp-intel Jul 22, 2025
c1b7cc8
refactor: optimize metric processing by using index-based iteration f…
harp-intel Jul 22, 2025
a274c9b
emr
harp-intel Jul 22, 2025
ef88efb
refactor: update logging in metric transformation and uncore event ha…
harp-intel Jul 22, 2025
24c01ea
refactor: enhance metric loading by removing uncollectable events and…
harp-intel Jul 22, 2025
78dbd46
ocr with custom msr
harp-intel Jul 22, 2025
3758311
rename dynamic to perfmon and static to legacy
harp-intel Jul 22, 2025
b4647c2
merge changes from main
harp-intel Jul 23, 2025
081bbc0
add copyright to source files
harp-intel Jul 23, 2025
2a297dc
use maps.Copy
harp-intel Jul 23, 2025
76bf520
refactor metric and event loading
harp-intel Jul 12, 2025
920f5d6
support for dynamic loader (incomplete)
harp-intel Jul 18, 2025
7ef5cc8
replace metric aliases
harp-intel Jul 18, 2025
9b91745
refactor metric and event loading
harp-intel Jul 12, 2025
22f4cf3
other events and more
harp-intel Jul 21, 2025
ee56369
enhance getExpression to replace fixed counter event names with regex…
harp-intel Jul 21, 2025
671ef26
refactor: update UniqueID format for uncore events to use 'UNC_' prefix
harp-intel Jul 21, 2025
cc18d7b
refactor: update metric lookup to use LegacyName instead of MetricName
harp-intel Jul 21, 2025
3694fc1
enhance: add function to remove 'metric_' prefix from metric names
harp-intel Jul 21, 2025
2e5969b
refactor: streamline metric configuration by consolidating constant r…
harp-intel Jul 22, 2025
a4de85e
refactor: optimize metric processing by using index-based iteration f…
harp-intel Jul 22, 2025
2e07c5b
emr
harp-intel Jul 22, 2025
cdc2e9c
refactor: update logging in metric transformation and uncore event ha…
harp-intel Jul 22, 2025
d7e7203
refactor: enhance metric loading by removing uncollectable events and…
harp-intel Jul 22, 2025
b1e7fad
ocr with custom msr
harp-intel Jul 22, 2025
dd0fcf5
rename dynamic to perfmon and static to legacy
harp-intel Jul 22, 2025
1ad1202
add copyright to source files
harp-intel Jul 23, 2025
10b9a47
use maps.Copy
harp-intel Jul 23, 2025
18482b8
fix merge
harp-intel Jul 23, 2025
ad8151b
Merge branch 'loader' of github.com:intel/PerfSpect into loader
harp-intel Jul 23, 2025
26085c7
read all perfmon metric fields into struct
harp-intel Jul 23, 2025
d5b305e
refactor: remove omitempty from JSON tags in PerfmonMetric structures
harp-intel Jul 23, 2025
233e9f9
alternate tma metrics
harp-intel Jul 24, 2025
f9aabf6
fix alternate definition loading
harp-intel Jul 24, 2025
8981aa0
uncollectable events when in cgroup or process scope
harp-intel Jul 24, 2025
da8798a
add spr fix ocr
harp-intel Jul 24, 2025
65a790d
Merge branch 'main' into loader
harp-intel Jul 25, 2025
fcc1c47
only one group for topdown.slots
harp-intel Jul 28, 2025
1ac62da
refactor: improve event collectability checks in OtherEvent
harp-intel Jul 28, 2025
3fd6ba2
async-profiler moved
harp-intel Jul 28, 2025
e7a88a6
checkpoint checking
harp-intel Jul 29, 2025
773051e
Merge branch 'main' into loader
harp-intel Jul 29, 2025
6c69dbd
add missing alternative formulas for SPR TMA
harp-intel Jul 29, 2025
d7134a3
update alternate TMA formulas for EMR and GNR
harp-intel Jul 30, 2025
02f14ac
refactor: simplify merging logic for CoreGroup and UncoreGroup
harp-intel Jul 30, 2025
152482b
adjust comment
harp-intel Aug 1, 2025
c339634
more debug output
harp-intel Aug 1, 2025
ef34e0c
refactor
harp-intel Aug 1, 2025
147d6c4
print groups for debugging
harp-intel Aug 1, 2025
5dd3538
TMA events in multiple groups
harp-intel Aug 2, 2025
80c2178
OFFCORE_REQUESTS events supported on AWS VMs
harp-intel Aug 2, 2025
34f6ef6
use perfmon loader for ICX
harp-intel Aug 3, 2025
9820e93
refactor: consolidate metric configuration and retire latency handlin…
harp-intel Aug 4, 2025
37096ff
refactor: move retire latency handling to loader_perfmon and clean up…
harp-intel Aug 4, 2025
ac9e9d5
Remove obsolete metrics configuration for GenuineIntel architecture f…
harp-intel Aug 4, 2025
7953723
perfmon loader for SRF
harp-intel Aug 4, 2025
89ad835
refactor: streamline fixed purpose counter handling in AddEvent method
harp-intel Aug 4, 2025
d6d2101
merge upstream
harp-intel Aug 5, 2025
47548bb
add missing copyright
harp-intel Aug 5, 2025
1490d6e
rename perfmon_util.go to loader_util.go
harp-intel Aug 5, 2025
5c579f0
rearrange resource files
harp-intel Aug 5, 2025
a1bf269
Merge branch 'main' into loader
harp-intel Aug 5, 2025
503187b
rename file
harp-intel Aug 5, 2025
b336dfb
refactor: improve error handling for event group assignments in metri…
harp-intel Aug 5, 2025
ad5bf24
refactor: replace fmt.Printf with slog logging for event formatting e…
harp-intel Aug 6, 2025
b51a5af
Update cmd/metrics/resources/perfmon/srf/srf.json
harp-intel Aug 6, 2025
109ddc6
Update cmd/metrics/resources/perfmon/icx/icx.json
harp-intel Aug 6, 2025
6c5ff2d
Update cmd/metrics/resources/perfmon/spr/spr.json
harp-intel Aug 6, 2025
1c94f98
Merge branch 'main' into loader
harp-intel Aug 7, 2025
0ed661a
Merge branch 'main' into loader
harp-intel Aug 8, 2025
7047456
add support for filtering metrics
harp-intel Aug 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions cmd/metrics/event_frame.go
Original file line number Diff line number Diff line change
Expand Up @@ -118,12 +118,22 @@ func parseEvents(rawEvents [][]byte, eventGroupDefinitions []GroupDefinition) ([
previousEvent := ""
var eventsNotCounted []string
var eventsNotSupported []string
for _, rawEvent := range rawEvents {
for i, rawEvent := range rawEvents {
event, err := parseEventJSON(rawEvent) // nosemgrep
if err != nil {
slog.Error(err.Error(), slog.String("event", string(rawEvent)))
// if error log the current line and up to 5 more lines
out := string(rawEvent)
for j := i + 1; j < len(rawEvents) && j < i+5; j++ {
out += "\n" + string(rawEvents[j])
}
slog.Error(err.Error(), slog.String("perf output", out))
return nil, err
}
// sometimes perf will prepend "cpu/" to the topdown event names, e.g., cpu/topdown-retiring/, we clean it up here to match metric formulas
if strings.HasPrefix(event.Event, "cpu/") && strings.Contains(event.Event, "topdown") && strings.HasSuffix(event.Event, "/") {
event.Event = strings.TrimPrefix(event.Event, "cpu/")
event.Event = strings.TrimSuffix(event.Event, "/")
}
switch event.CounterValue {
case "<not counted>":
slog.Debug("event not counted", slog.String("event", string(rawEvent)))
Expand Down
75 changes: 75 additions & 0 deletions cmd/metrics/loader.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
package metrics

// Copyright (C) 2021-2025 Intel Corporation
// SPDX-License-Identifier: BSD-3-Clause

import (
"fmt"
"log/slog"
"strings"

"github.com/Knetic/govaluate"
)

type MetricDefinition struct {
Name string `json:"name"`
Expression string `json:"expression"`
Description string `json:"description"`
Variables map[string]int // parsed from Expression for efficiency, int represents group index
Evaluable *govaluate.EvaluableExpression // parse expression once, store here for use in metric evaluation
}

// EventDefinition represents a single perf event
type EventDefinition struct {
Raw string
Name string
Device string
}

// GroupDefinition represents a group of perf events
type GroupDefinition []EventDefinition

type Loader interface {
Load(metricDefinitionOverridePath string, eventDefinitionOverridePath string, selectedMetrics []string, metadata Metadata) (metrics []MetricDefinition, groups []GroupDefinition, err error)
}

type BaseLoader struct {
microarchitecture string
}

type LegacyLoader struct {
BaseLoader
}

type PerfmonLoader struct {
BaseLoader
}

func NewLoader(uarch string) (Loader, error) {
switch strings.ToLower(uarch) {
case "clx", "skx", "bdx", "bergamo", "genoa", "turin":
slog.Debug("Using legacy loader for microarchitecture", slog.String("uarch", uarch))
return newLegacyLoader(strings.ToLower(uarch)), nil
case "gnr", "srf", "emr", "spr", "icx":
slog.Debug("Using perfmon loader for microarchitecture", slog.String("uarch", uarch))
return newPerfmonLoader(strings.ToLower(uarch)), nil
default:
return nil, fmt.Errorf("unsupported microarchitecture: %s", uarch)
}
}

func newLegacyLoader(uarch string) *LegacyLoader {
return &LegacyLoader{
BaseLoader: BaseLoader{
microarchitecture: uarch,
},
}
}

func newPerfmonLoader(uarch string) *PerfmonLoader {
return &PerfmonLoader{
BaseLoader: BaseLoader{
microarchitecture: uarch,
},
}
}
174 changes: 104 additions & 70 deletions cmd/metrics/event_defs.go → cmd/metrics/loader_legacy.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,9 @@ package metrics
// Copyright (C) 2021-2025 Intel Corporation
// SPDX-License-Identifier: BSD-3-Clause

// helper functions for parsing and interpreting the architecture-specific perf event definition files

import (
"bufio"
"encoding/json"
"fmt"
"io/fs"
"log/slog"
Expand All @@ -19,19 +18,73 @@ import (
mapset "github.com/deckarep/golang-set/v2"
)

// EventDefinition represents a single perf event
type EventDefinition struct {
Raw string
Name string
Device string
func (l *LegacyLoader) Load(metricDefinitionOverridePath string, eventDefinitionOverridePath string, selectedMetrics []string, metadata Metadata) ([]MetricDefinition, []GroupDefinition, error) {
loadedMetricDefinitions, err := loadMetricDefinitions(metricDefinitionOverridePath, selectedMetrics, metadata)
if err != nil {
return nil, nil, fmt.Errorf("failed to load metric definitions: %w", err)
}
loadedEventGroups, uncollectableEvents, err := loadEventGroups(eventDefinitionOverridePath, metadata)
if err != nil {
return nil, nil, fmt.Errorf("failed to load event group definitions: %w", err)
}
configuredMetricDefinitions, err := configureMetrics(loadedMetricDefinitions, uncollectableEvents, metadata)
if err != nil {
return nil, nil, fmt.Errorf("failed to configure metrics: %w", err)
}
return configuredMetricDefinitions, loadedEventGroups, nil
}

// GroupDefinition represents a group of perf events
type GroupDefinition []EventDefinition
// loadMetricDefinitions reads and parses metric definitions from an architecture-specific metric
// definition file. When the override path argument is empty, the function will load metrics from
// the file associated with the platform's architecture found in the provided metadata. When
// a list of metric names is provided, only those metric definitions will be loaded.
func loadMetricDefinitions(metricDefinitionOverridePath string, selectedMetrics []string, metadata Metadata) (metrics []MetricDefinition, err error) {
var bytes []byte
if metricDefinitionOverridePath != "" {
bytes, err = os.ReadFile(metricDefinitionOverridePath) // #nosec G304
if err != nil {
return
}
} else {
uarch := strings.ToLower(strings.Split(metadata.Microarchitecture, "_")[0])
uarch = strings.Split(uarch, " ")[0]
metricFileName := fmt.Sprintf("%s.json", uarch)
if bytes, err = resources.ReadFile(filepath.Join("resources", "legacy", "metrics", metadata.Architecture, metadata.Vendor, metricFileName)); err != nil {
return
}
}
var metricsInFile []MetricDefinition
if err = json.Unmarshal(bytes, &metricsInFile); err != nil {
return
}
// if a list of metric names provided, reduce list to match
if len(selectedMetrics) > 0 {
// confirm provided metric names are valid (included in metrics defined in file)
// and build list of metrics based on provided list of metric names
metricMap := make(map[string]MetricDefinition)
for _, metric := range metricsInFile {
metricMap[metric.Name] = metric
}
for _, selectedMetricName := range selectedMetrics {
if _, ok := metricMap[selectedMetricName]; !ok {
err = fmt.Errorf("provided metric name not found: %s", selectedMetricName)
return
}
metrics = append(metrics, metricMap[selectedMetricName])
}
} else {
metrics = metricsInFile
}
// abbreviate event names in metrics to shorten the eventual perf stat command line
for i := range metrics {
metrics[i].Expression = abbreviateEventName(metrics[i].Expression)
}
return
}

// LoadEventGroups reads the events defined in the architecture specific event definition file, then
// loadEventGroups reads the events defined in the architecture specific event definition file, then
// expands them to include the per-device uncore events
func LoadEventGroups(eventDefinitionOverridePath string, metadata Metadata) (groups []GroupDefinition, uncollectableEvents []string, err error) {
func loadEventGroups(eventDefinitionOverridePath string, metadata Metadata) (groups []GroupDefinition, uncollectableEvents []string, err error) {
var file fs.File
if eventDefinitionOverridePath != "" {
file, err = os.Open(eventDefinitionOverridePath) // #nosec G304
Expand All @@ -41,22 +94,14 @@ func LoadEventGroups(eventDefinitionOverridePath string, metadata Metadata) (gro
} else {
uarch := strings.ToLower(strings.Split(metadata.Microarchitecture, "_")[0])
uarch = strings.Split(uarch, " ")[0]
// use alternate events/metrics when TMA fixed counters are not supported
alternate := ""
if (uarch == "icx" || uarch == "spr" || uarch == "emr" || uarch == "gnr") && !metadata.SupportsFixedTMA { // AWS/GCP VM instances
alternate = "_nofixedtma"
}
eventFileName := fmt.Sprintf("%s%s.txt", uarch, alternate)
if file, err = resources.Open(filepath.Join("resources", "events", metadata.Architecture, metadata.Vendor, eventFileName)); err != nil {
eventFileName := fmt.Sprintf("%s.txt", uarch)
if file, err = resources.Open(filepath.Join("resources", "legacy", "events", metadata.Architecture, metadata.Vendor, eventFileName)); err != nil {
return
}
}
defer file.Close()
scanner := bufio.NewScanner(file)
uncollectable := mapset.NewSet[string]()
if flagTransactionRate == 0 {
uncollectable.Add("TXN")
}
var group GroupDefinition
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
Expand Down Expand Up @@ -104,65 +149,22 @@ func LoadEventGroups(eventDefinitionOverridePath string, metadata Metadata) (gro
return
}

// abbreviateEventName replaces long event names with abbreviations to reduce the length of the perf command.
// focus is on uncore events because they are repeated for each uncore device
func abbreviateEventName(event string) string {
// Abbreviations must be unique and in order. And, if replacing UNC_*, the abbreviation must begin with "UNC" because this is how we identify uncore events when collapsing them.
var abbreviations = [][]string{
{"UNC_CHA_TOR_INSERTS", "UNCCTI"},
{"UNC_CHA_TOR_OCCUPANCY", "UNCCTO"},
{"UNC_CHA_CLOCKTICKS", "UNCCCT"},
{"UNC_M_CAS_COUNT_SCH", "UNCMCC"},
{"IA_MISS_DRD_REMOTE", "IMDR"},
{"IA_MISS_DRD_LOCAL", "IMDL"},
{"IA_MISS_LLCPREFDATA", "IMLP"},
{"IA_MISS_LLCPREFRFO", "IMLR"},
{"IA_MISS_DRD_PREF_LOCAL", "IMDPL"},
{"IA_MISS_DRD_PREF_REMOTE", "IMDRP"},
{"IA_MISS_CRD_PREF", "IMCP"},
{"IA_MISS_RFO_PREF", "IMRP"},
{"IA_MISS_RFO", "IMRF"},
{"IA_MISS_CRD", "IMC"},
{"IA_MISS_DRD", "IMD"},
{"IO_PCIRDCUR", "IPCI"},
{"IO_ITOMCACHENEAR", "IITN"},
{"IO_ITOM", "IITO"},
{"IMD_OPT", "IMDO"},
}
// if an abbreviation key is found in the event, replace the matching portion of the event with the abbreviation
for _, abbr := range abbreviations {
event = strings.Replace(event, abbr[0], abbr[1], -1)
}
return event
}

// isCollectableEvent confirms if given event can be collected on the platform
func isCollectableEvent(event EventDefinition, metadata Metadata) bool {
// fixed-counter TMA
if !metadata.SupportsFixedTMA && (event.Name == "TOPDOWN.SLOTS" || strings.HasPrefix(event.Name, "PERF_METRICS.")) {
slog.Debug("Fixed counter TMA not supported on target", slog.String("event", event.Name))
return false
}
// PEBS events (not supported on GCP c4 VMs)
pebsEventNames := []string{"INT_MISC.UNKNOWN_BRANCH_CYCLES", "UOPS_RETIRED.MS"}
if !metadata.SupportsPEBS {
for _, pebsEventName := range pebsEventNames {
if strings.Contains(event.Name, pebsEventName) {
slog.Debug("PEBS events not supported on target", slog.String("event", event.Name))
return false
}
}
}
// short-circuit for cpu events that aren't off-core response events
if event.Device == "cpu" && !(strings.HasPrefix(event.Name, "OCR") || strings.HasPrefix(event.Name, "OFFCORE_REQUESTS_OUTSTANDING")) {
if event.Device == "cpu" && !strings.HasPrefix(event.Name, "OCR") {
return true
}
// off-core response events
if event.Device == "cpu" && (strings.HasPrefix(event.Name, "OCR") || strings.HasPrefix(event.Name, "OFFCORE_REQUESTS_OUTSTANDING")) {
if !(metadata.SupportsOCR && metadata.SupportsUncore) {
slog.Debug("Off-core response events not supported on target", slog.String("event", event.Name))
return false
} else if flagScope == scopeProcess || flagScope == scopeCgroup {
// short-circuit off-core response events
if event.Device == "cpu" &&
strings.HasPrefix(event.Name, "OCR") &&
metadata.SupportsUncore {
if flagScope == scopeProcess || flagScope == scopeCgroup {
slog.Debug("Off-core response events not supported in process or cgroup scope", slog.String("event", event.Name))
return false
}
Expand Down Expand Up @@ -296,3 +298,35 @@ func expandUncoreGroups(groups []GroupDefinition, metadata Metadata) (expandedGr
}
return
}

// abbreviateEventName replaces long event names with abbreviations to reduce the length of the perf command.
// focus is on uncore events because they are repeated for each uncore device
func abbreviateEventName(event string) string {
// Abbreviations must be unique and in order. And, if replacing UNC_*, the abbreviation must begin with "UNC" because this is how we identify uncore events when collapsing them.
var abbreviations = [][]string{
{"UNC_CHA_TOR_INSERTS", "UNCCTI"},
{"UNC_CHA_TOR_OCCUPANCY", "UNCCTO"},
{"UNC_CHA_CLOCKTICKS", "UNCCCT"},
{"UNC_M_CAS_COUNT_SCH", "UNCMCC"},
{"IA_MISS_DRD_REMOTE", "IMDR"},
{"IA_MISS_DRD_LOCAL", "IMDL"},
{"IA_MISS_LLCPREFDATA", "IMLP"},
{"IA_MISS_LLCPREFRFO", "IMLR"},
{"IA_MISS_DRD_PREF_LOCAL", "IMDPL"},
{"IA_MISS_DRD_PREF_REMOTE", "IMDRP"},
{"IA_MISS_CRD_PREF", "IMCP"},
{"IA_MISS_RFO_PREF", "IMRP"},
{"IA_MISS_RFO", "IMRF"},
{"IA_MISS_CRD", "IMC"},
{"IA_MISS_DRD", "IMD"},
{"IO_PCIRDCUR", "IPCI"},
{"IO_ITOMCACHENEAR", "IITN"},
{"IO_ITOM", "IITO"},
{"IMD_OPT", "IMDO"},
}
// if an abbreviation key is found in the event, replace the matching portion of the event with the abbreviation
for _, abbr := range abbreviations {
event = strings.Replace(event, abbr[0], abbr[1], -1)
}
return event
}
Loading