Skip to content

Commit

Permalink
cleanup collect and add deamonset (#21)
Browse files Browse the repository at this point in the history
  • Loading branch information
hilldani authored Apr 4, 2023
1 parent 9c61c46 commit b293ac3
Show file tree
Hide file tree
Showing 22 changed files with 321 additions and 1,926 deletions.
2 changes: 0 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,6 @@ build-public/postprocess:
--add-data "./events/metric_bdx.json:." \
--add-data "./events/metric_icx.json:." \
--add-data "./events/metric_spr.json:." \
--add-data "./events/metric_icx_aws.json:." \
--add-data "./events/metric_spr_aws.json:." \
--runtime-tmpdir . \
--exclude-module readline
cp $(TMPDIR)/dist/perf-postprocess build/
Expand Down
109 changes: 51 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,39 +8,47 @@ The tool has two parts
1. perf collection to collect underlying PMU (Performance Monitoring Unit) counters
2. post processing that generates csv output of performance metrics.

## Quick start (requires perf installed)
### Quick start (requires perf installed)

```
wget -qO- https://github.com/intel/PerfSpect/releases/latest/download/perfspect.tgz | tar xvz
cd perfspect
sudo ./perf-collect --timeout 10
sudo ./perf-postprocess -r results/perfstat.csv --html perfstat.html
```

### Deploy in Kubernetes

Modify the template [deamonset.yml](docs/daemonset.yml) to deploy in kubernetes

![basic_stats](https://raw.githubusercontent.com/wiki/intel/PerfSpect/basic_stats.JPG)
![perfspect-demo1](https://raw.githubusercontent.com/wiki/intel/PerfSpect/demo.gif)

## Requirements

### Packages:

- **perf** - PerfSpect uses the Linux perf tool to collect PMU counters

### Supported kernels
### Minimum supported kernels

| Xeon Generation | Minimum Kernel |
| - | - |
| Broadwell | kernel 4.15 |
| Skylake | kernel 4.15 |
| Cascadelake | kernel 4.15 |
| Icelake | kernel 5.9 |
| Sapphire Rapids | kernel 5.12 |
| Xeon Generation | centos 7+ | ubuntu 16.04+ |
| --------------- | --------- | ------------- |
| Broadwell | 3.10 | 4.15 |
| Skylake | 3.10 | 4.15 |
| Cascadelake | 3.10 | 4.15 |
| Icelake | 3.10 | 4.15 |
| Sapphire Rapids | 5.12 | 5.12 |

### Supported Operating Systems:

- Ubuntu 16.04 and newer
- centos 7 and newer
- Amazon Linux 2
- RHEL 9
- Debian 11

*Note: PerfSpect may work on other Linux distributions, but has not been thoroughly tested*
_Note: PerfSpect may work on other Linux distributions, but has not been thoroughly tested_

## Build from source

Expand All @@ -51,81 +59,69 @@ pip3 install -r requirements.txt
make
```

On successful build, binaries will be created in "dist" folder
On successful build, binaries will be created in `dist` folder

## Collection:

```
(sudo) ./perf-collect (options) -- Some options can be used only with root privileges
Options:
usage: perf-collect [-h] [-t TIMEOUT | -a APP]
[-p PID | -c CID | --thread | --socket] [-V] [-i INTERVAL]
[-m MUXINTERVAL] [-o OUTCSV] [-v]
optional arguments:
-h, --help show this help message and exit
-v, --version display version info
-e EVENTFILE, --eventfile EVENTFILE
Event file containing events to collect,
default=events/<architecture specific file>
-t TIMEOUT, --timeout TIMEOUT
perf event collection time
-a APP, --app APP Application to run with perf-collect, perf collection
ends after workload completion
-p PID, --pid PID perf-collect on selected PID(s)
-c CID, --cid CID perf-collect on selected container ids
--thread Collect for thread metrics
--socket Collect for socket metrics
-V, --version display version info
-i INTERVAL, --interval INTERVAL
interval in seconds for time series dump, default=1
-m MUXINTERVAL, --muxinterval MUXINTERVAL
event mux interval in milli seconds, default=0 i.e. will
use the system default
event mux interval in milli seconds, default=0 i.e.
will use the system default
-o OUTCSV, --outcsv OUTCSV
perf stat output in csv format,
default=results/perfstat.csv
-a APP, --app APP Application to run with perf-collect, perf collection ends
after workload completion
-p PID, --pid PID perf-collect on selected PID(s)
-c CID, --cid CID perf-collect on selected container ids
-t TIMEOUT, --timeout TIMEOUT
perf event collection time
--percore Enable per core event collection
--nogroups Disable perf event grouping, events are grouped by default
as in the event file
--dryrun Test if Performance Monitoring Counters are in-use, and
collect stats for 10sec to validate event file correctness
--metadata collect system info only, does not run perf
-csp CLOUD, --cloud CLOUD
Name of the Cloud Service Provider(AWS), if collecting on
cloud instances. Currently supporting AWS and OCI
-ct CLOUDTYPE, --cloudtype CLOUDTYPE
Instance type: Options include - VM,BM
-v, --verbose Display debugging information
```

### Examples

1. sudo ./perf-collect (collect PMU counters using predefined architecture specific event file until collection is terminated)
2. sudo ./perf-collect -m 10 -t 30 (sets event multiplexing interval to 10ms and collects PMU counters for 30 seconds using default architecture specific event file)
3. sudo ./perf-collect -a "myapp.sh myparameter" (collect perf for myapp.sh)
4. sudo ./perf-collect --dryrun (checks PMU usage, and collects PMU counters for 10 seconds using default architecture specific event file)
5. sudo ./perf-collect --metadata (collect system info and PMU event info without running perf, uses default outputfile if -o option is not used)
6. sudo ./perf-collect --cid "one or more container IDs from docker or kubernetes seperated by semicolon"

### Notes

1. Intel CPUs (until Cascadelake) have 3 fixed PMUs (cpu-cycles, ref-cycles, instructions) and 4 programmable PMUs. The events are grouped in event files with this assumption. However, some of the counters may not be available on some CPUs. You can check the correctness of the event file with dryrun and check the output for anamolies. Typically output will have "not counted", "unsuppported" or zero values for cpu-cycles if number of available counters are less than events in a group.
2. Globally pinned events can limit the number of counters available for perf event groups. On X86 systems NMI watchdog pins a fixed counter by default. NMI watchdog is disabled during perf collection if run as a sudo user. If NMI watchdog can't be disabled, event grouping will be forcefully disabled to let perf driver handle event multiplexing.
2. sudo ./perf-collect -a "myapp.sh myparameter" (collect perf for myapp.sh)
3. sudo ./perf-collect --cid "one or more container IDs from docker or kubernetes seperated by semicolon"

## Post-processing:

```
./perf-postprocess (options)
Options:
usage: perf-postprocess [-h] [--version] [-m METRICFILE] [-o OUTFILE]
[--persocket] [--percore] [-v] [--epoch] [-html HTML]
[-r RAWFILE]
perf-postprocess: perf post process
optional arguments:
-h, --help show this help message and exit
--version, -v display version information
--version, -V display version information
-m METRICFILE, --metricfile METRICFILE
formula file, default metric file for the architecture
-o OUTFILE, --outfile OUTFILE
perf stat outputs in csv format,
default=results/metric_out.csv
--persocket generate per socket metrics
--percore generate per core metrics
--keepall keep all intermediate csv files, use it for debug purpose
only
-v, --verbose include debugging information, keeps all intermediate
csv files
--epoch time series in epoch format, default is sample count
-csp CLOUD, --cloud CLOUD
Name of Cloud Service Provider(AWS), if you're intending
to postprocess on cloud instances
-html HTML, --html HTML
Static HTML report
Expand All @@ -145,16 +141,13 @@ required arguments:
1. metric_out.csv : Time series dump of the metrics. The metrics are defined in events/metric.json
2. metric_out.averags.csv: Average of metrics over the collection period
3. metric_out.raw.csv: csv file with raw events normalized per second
4. Socket/core level metrics: Additonal csv files outputfile.socket.csv/outputfile.core.csv will be generated. Socket/core level data will be added as new sheets if excel output is chosen
4. Socket/core level metrics: Additonal csv files outputfile.socket.csv/outputfile.core.csv will be generated.

## Caveats

1. The tool can collect only the counters supported by underlying linux perf version.
2. Current version supports Intel Sapphire Rapids, Icelake, Cascadelake, Skylake and Broadwell microarchitectures only.
3. Perf collection overhead will increase with increase in number of counters and/or dump interval. Using the right perf multiplexing (check perf-collection.py Notes for more details) interval to reduce overhead
4. If you run into locale issues - `UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 4519: ordinal not in range(128)`, more than likely the locales needs to be set appropriately. You could also try running post-process step with `LC_ALL=C.UTF-8 LANG=C.UTF-8 ./perf-postprocess -r result.csv`
5. The percore option is not supported while using cid.
6. The html report creation is not yet supported for cid collection.
2. If you run into locale issues - `UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 4519: ordinal not in range(128)`, more than likely the locales needs to be set appropriately. You could also try running post-process step with `LC_ALL=C.UTF-8 LANG=C.UTF-8 ./perf-postprocess -r result.csv`
3. The html report creation is not yet supported for cid collection.

## How to contribute

Expand Down
2 changes: 1 addition & 1 deletion _version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.2.0
1.2.5
28 changes: 28 additions & 0 deletions docs/daemonset.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
apiVersion: v1
kind: Namespace
metadata:
name: intel
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
app: perfspect
name: perfspect
namespace: intel
spec:
selector:
matchLabels:
app: perfspect
template:
metadata:
labels:
app: perfspect
spec:
containers:
- image: <your perfspect image>
name: perfspect
securityContext:
privileged: true
hostPID: true
restartPolicy: Always
7 changes: 4 additions & 3 deletions events/bdx.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ instructions;
cpu/event=0xd1,umask=0x01,period=2000003,name='MEM_LOAD_RETIRED.L1_HIT'/,
cpu/event=0xd1,umask=0x02,period=100003,name='MEM_LOAD_UOPS_RETIRED.L2_HIT'/,
cpu/event=0xd1,umask=0x10,period=50021,name='MEM_LOAD_UOPS_RETIRED.L2_MISS'/,
cpu/event=0x3c,umask=0x0,any=1,period=2000003,name='CPU_CLK_UNHALTED.THREAD_ANY'/,
cpu/event=0x3c,umask=0x0,period=2000003,name='CPU_CLK_UNHALTED.THREAD_ANY'/,
cpu-cycles,
ref-cycles,
instructions;
Expand Down Expand Up @@ -120,13 +120,14 @@ cstate_pkg/c6-residency/;
#uops delivered from different units
cpu/event=0x0e,umask=0x01,period=2000003,name='UOPS_ISSUED.ANY'/,
cpu/event=0xc2,umask=0x02,period=2000003,name='UOPS_RETIRED.RETIRE_SLOTS'/,
cpu/event=0x0d,umask=0x03,cmask=1,any=1,period=2000003,name='INT_MISC.RECOVERY_CYCLES_ANY'/,
cpu/event=0x0d,umask=0x03,cmask=1,period=2000003,name='INT_MISC.RECOVERY_CYCLES_ANY'/,
cpu-cycles,
ref-cycles,
instructions;

cpu/event=0x3c,umask=0x2,name='CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE'/,
cpu/event=0x3c,umask=0x1,any=1,name='CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY'/,
cpu/event=0x3c,umask=0x1,name='CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY'/;

#offcore response
cpu/event=0xb7,umask=0x01,offcore_rsp=0x103FC007F7,name='OCR.ALL_READS.L3_MISS.REMOTE_HITM'/,
cpu/event=0xb7,umask=0x01,offcore_rsp=0x083FC007F7,name='OCR.ALL_READS.L3_MISS.REMOTE_HIT_FORWARD'/;
Expand Down
7 changes: 4 additions & 3 deletions events/clx.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,11 @@ cpu/event=0x28,umask=0x18,period=200003,name='CORE_POWER.LVL1_TURBO_LICENSE'/,
cpu/event=0x28,umask=0x20,period=200003,name='CORE_POWER.LVL2_TURBO_LICENSE'/,
cpu/event=0x0e,umask=0x01,period=2000003,name='UOPS_ISSUED.ANY'/;

cpu/event=0x3c,umask=0x0,any=1,period=2000003,name='CPU_CLK_UNHALTED.THREAD_ANY'/,
cpu/event=0x3c,umask=0x0,period=2000003,name='CPU_CLK_UNHALTED.THREAD_ANY'/,
cpu/event=0x9c,umask=0x01,period=2000003,name='IDQ_UOPS_NOT_DELIVERED.CORE'/,
cpu/event=0xc2,umask=0x02,period=2000003,name='UOPS_RETIRED.RETIRE_SLOTS'/,
#INT_MISC.RECOVERY_CYCLES_ANY
cpu/event=0x0d,umask=0x01,any=1,period=2000003,name='INT_MISC.RECOVERY_CYCLES_ANY'/;
cpu/event=0x0d,umask=0x01,period=2000003,name='INT_MISC.RECOVERY_CYCLES_ANY'/;

cpu/event=0x79,umask=0x30,period=2000003,name='IDQ.MS_UOPS'/,
cpu/event=0x60,umask=0x10,period=2000003,name='OFFCORE_REQUESTS_OUTSTANDING.L3_MISS_DEMAND_DATA_RD'/,
Expand Down Expand Up @@ -112,7 +112,8 @@ cpu/event=0xb1,umask=0x02,cmask=0x4,period=2000003,name='UOPS_EXECUTED.CORE_CYCL
cpu-cycles;

cpu/event=0x3c,umask=0x2,name='CPU_CLK_THREAD_UNHALTED.ONE_THREAD_ACTIVE'/,
cpu/event=0x3c,umask=0x1,any=1,name='CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY'/,
cpu/event=0x3c,umask=0x1,name='CPU_CLK_THREAD_UNHALTED.REF_XCLK_ANY'/;

#offcore response
cpu/event=0xb7,umask=0x01,offcore_rsp=0x103FC007F7,name='OCR.ALL_READS.L3_MISS.REMOTE_HITM'/,
cpu/event=0xb7,umask=0x01,offcore_rsp=0x083FC007F7,name='OCR.ALL_READS.L3_MISS.REMOTE_HIT_FORWARD'/;
Expand Down
Loading

0 comments on commit b293ac3

Please sign in to comment.