Skip to content

Commit

Permalink
Sampling Improvement (#295)
Browse files Browse the repository at this point in the history
* Dependency update (#286)

Use dependency from Efficios deliverable.

---------

Co-authored-by: Thomas Applencourt <[email protected]>

* Add archive (#287)

Enable usage of session rotation for lossless online trace consumption.

---------

Co-authored-by: Thomas Applencourt <[email protected]>

* Single rank profiling (#288)

* Make only local master do energy profiling.

* Use ZES to query devices in order to get around affinity masks.

* Use ZES for drivers as well.

* set ZES

* Update ze/tracer_ze_helpers.include.c

Co-authored-by: Brice Videau <[email protected]>

* Update ze/tracer_ze_helpers.include.c

Co-authored-by: Brice Videau <[email protected]>

* Update xprof/xprof.rb.in

---------

Co-authored-by: Brice Videau <[email protected]>
Co-authored-by: Thomas Applencourt <[email protected]>

* fabricPort sampling

* Fabric Timeline

Sampling timeline rewrite

timeline cleanup

timeline fix

Minor fix

* Sampling rewrite

updated

* Timeline cleanup

clean up

* include telemetry handles

* uuid based timeline

* Memory sampling

timeline cleanup

* zes_support

* rebased

* Code cleanup and fix

* Update ze/btx_zeinterval_callbacks.cpp

Co-authored-by: Thomas Applencourt <[email protected]>

* Update xprof/xprof.rb.in

Co-authored-by: Thomas Applencourt <[email protected]>

* Update xprof/xprof.rb.in

Co-authored-by: Thomas Applencourt <[email protected]>

* PR corrections

* separate sampling stream

* Deltas and Hash-return handled

* Remove Ze calls for subDevice

* minor change on delta

---------

Co-authored-by: Thomas Applencourt <[email protected]>
Co-authored-by: Thomas Applencourt <[email protected]>
Co-authored-by: Brice Videau <[email protected]>
Co-authored-by: sbekele <[email protected]>
Co-authored-by: Solomon Bekele <[email protected]>
Co-authored-by: Solomon Bekele <[email protected]>
Co-authored-by: Solomon Bekele <[email protected]>
Co-authored-by: Solomon Bekele <[email protected]>
Co-authored-by: Solomon Bekele <[email protected]>
  • Loading branch information
10 people authored Oct 23, 2024
1 parent b970fb3 commit dcd4a0c
Show file tree
Hide file tree
Showing 7 changed files with 1,149 additions and 374 deletions.
9 changes: 6 additions & 3 deletions utils/xprof_utils.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,9 @@ typedef intptr_t process_id_t;
typedef uintptr_t thread_id_t;
typedef std::string hostname_t;
typedef std::string thapi_function_name;
typedef uintptr_t thapi_device_id;
typedef uint64_t thapi_device_id;
typedef uint64_t thapi_telemetry_handle;
typedef uintptr_t thapi_fabricPort_id;
typedef uint32_t thapi_domain_idx;
typedef uint32_t thapi_sdevice_idx;

Expand All @@ -69,9 +71,10 @@ typedef std::tuple<hostname_t, process_id_t, thread_id_t, thapi_device_id, thapi
thapi_function_name>
hpt_device_function_name_t;
typedef std::tuple<hostname_t, process_id_t, thapi_device_id> hp_device_t;
typedef std::tuple<hostname_t, thapi_device_id> h_device_t;
typedef std::tuple<hostname_t, process_id_t, thapi_device_id, thapi_device_id> hp_dsd_t;
typedef std::tuple<hostname_t, process_id_t, thapi_device_id, thapi_domain_idx> hp_ddomain_t;
typedef std::tuple<hostname_t, process_id_t, thapi_device_id, thapi_sdevice_idx> hp_dsdev_t;
typedef std::tuple<hostname_t, thapi_device_id, thapi_telemetry_handle, thapi_domain_idx> h_ddomain_t;
typedef std::tuple<hostname_t, thapi_device_id, thapi_telemetry_handle, thapi_sdevice_idx, bool> h_dfsdev_t;
typedef std::tuple<long, long> sd_t;
typedef std::tuple<thread_id_t, thapi_function_name, long> tfn_ts_t;
typedef std::tuple<thapi_function_name, long> fn_ts_t;
Expand Down
152 changes: 148 additions & 4 deletions xprof/btx_interval_model.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,21 @@
:field_class:
:type: string
:cast_type: const char*
- :name: lttng:frequency
- :name: interval_sampling
:event_common_context_field_class:
:type: structure
:members:
- :name: hostname
:field_class:
:type: string
:cast_type: const char*
- :name: ts
:field_class:
:type: integer_signed
:field_value_range: 64
:cast_type: int64_t
:event_classes:
- :name: sampling:frequency
:payload_field_class:
:type: structure
:members:
Expand All @@ -102,6 +116,16 @@
:type: integer_unsigned
:field_value_range: 64
:cast_type: uint64_t
- :name: deviceIdx
:field_class:
:type: integer_unsigned
:field_value_range: 32
:cast_type: uint32_t
- :name: hFrequency
:field_class:
:type: integer_unsigned
:field_value_range: 64
:cast_type: uint64_t
- :name: domain
:field_class:
:type: integer_unsigned
Expand All @@ -112,7 +136,7 @@
:type: integer_unsigned
:field_value_range: 64
:cast_type: uint64_t
- :name: lttng:power
- :name: sampling:power
:payload_field_class:
:type: structure
:members:
Expand All @@ -121,6 +145,16 @@
:type: integer_unsigned
:field_value_range: 64
:cast_type: uint64_t
- :name: deviceIdx
:field_class:
:type: integer_unsigned
:field_value_range: 32
:cast_type: uint32_t
- :name: hPower
:field_class:
:type: integer_unsigned
:field_value_range: 64
:cast_type: uint64_t
- :name: domain
:field_class:
:type: integer_unsigned
Expand All @@ -131,7 +165,7 @@
:type: integer_unsigned
:field_value_range: 64
:cast_type: uint64_t
- :name: lttng:computeEU
- :name: sampling:computeEU
:payload_field_class:
:type: structure
:members:
Expand All @@ -140,6 +174,16 @@
:type: integer_unsigned
:field_value_range: 64
:cast_type: uint64_t
- :name: deviceIdx
:field_class:
:type: integer_unsigned
:field_value_range: 32
:cast_type: uint32_t
- :name: hEngine
:field_class:
:type: integer_unsigned
:field_value_range: 64
:cast_type: uint64_t
- :name: subDevice
:field_class:
:type: integer_unsigned
Expand All @@ -149,7 +193,7 @@
:field_class:
:type: single
:cast_type: float
- :name: lttng:copyEU
- :name: sampling:copyEU
:payload_field_class:
:type: structure
:members:
Expand All @@ -158,6 +202,16 @@
:type: integer_unsigned
:field_value_range: 64
:cast_type: uint64_t
- :name: deviceIdx
:field_class:
:type: integer_unsigned
:field_value_range: 32
:cast_type: uint32_t
- :name: hEngine
:field_class:
:type: integer_unsigned
:field_value_range: 64
:cast_type: uint64_t
- :name: subDevice
:field_class:
:type: integer_unsigned
Expand All @@ -167,3 +221,93 @@
:field_class:
:type: single
:cast_type: float
- :name: sampling:fabricPort
:payload_field_class:
:type: structure
:members:
- :name: did
:field_class:
:type: integer_unsigned
:field_value_range: 64
:cast_type: uint64_t
- :name: deviceIdx
:field_class:
:type: integer_unsigned
:field_value_range: 32
:cast_type: uint32_t
- :name: hFabricPort
:field_class:
:type: integer_unsigned
:field_value_range: 64
:cast_type: uint64_t
- :name: subDevice
:field_class:
:type: integer_unsigned
:field_value_range: 32
:cast_type: uint32_t
- :name: portId
:field_class:
:type: integer_unsigned
:field_value_range: 32
:cast_type: uint32_t
- :name: remotePortId
:field_class:
:type: integer_unsigned
:field_value_range: 32
:cast_type: uint32_t
- :name: rxThroughput
:field_class:
:type: double
:cast_type: float
- :name: txThroughput
:field_class:
:type: double
:cast_type: float
- :name: rxSpeed
:field_class:
:type: double
:cast_type: float
- :name: txSpeed
:field_class:
:type: double
:cast_type: float
- :name: sampling:memModule
:payload_field_class:
:type: structure
:members:
- :name: did
:field_class:
:type: integer_unsigned
:field_value_range: 64
:cast_type: uint64_t
- :name: deviceIdx
:field_class:
:type: integer_unsigned
:field_value_range: 32
:cast_type: uint32_t
- :name: hMemModule
:field_class:
:type: integer_unsigned
:field_value_range: 64
:cast_type: uint64_t
- :name: subDevice
:field_class:
:type: integer_unsigned
:field_value_range: 32
:cast_type: uint32_t
- :name: pBandwidth
:field_class:
:type: double
:cast_type: float
- :name: rdBandwidth
:field_class:
:type: double
:cast_type: float
- :name: wtBandwidth
:field_class:
:type: double
:cast_type: float
- :name: occupancy
:field_class:
:type: double
:cast_type: float
Loading

0 comments on commit dcd4a0c

Please sign in to comment.