Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sensor): support NVIDIA Grace Hopper #1884

Merged
merged 4 commits into from
Dec 11, 2024

Conversation

rootfs
Copy link
Contributor

@rootfs rootfs commented Dec 6, 2024

Based on the NVIDIA doc and some probing on a GH system.

@rootfs rootfs marked this pull request as draft December 6, 2024 00:15
Copy link
Contributor

github-actions bot commented Dec 6, 2024

🤖 SeineSailor

Here's a concise summary of the pull request changes:

Summary: This WIP pull request adds support for the NVIDIA Grace Hopper processor to the sensor feature in the devices package.

Key Modifications:

  1. Introduced new types gpuGraceACPI and GraceACPI with device-specific functionality for the Grace Hopper.
  2. Implemented methods for initializing and reading power data from the device.
  3. Updated graceCheck and graceDeviceStartup functions to support the new type.
  4. Added new functions findModulePowerPaths, readPowerFile, and AbsEnergyFromDevice to read power data from the Grace Hopper.

Impact: These changes affect the external interface and behavior of the code, enabling support for the NVIDIA Grace Hopper processor.

Observations/Suggestions:

  • The changes seem to be well-structured and follow a consistent pattern.
  • It would be beneficial to include unit tests to ensure the new functionality works as expected.
  • Consider adding documentation or comments to explain the purpose and usage of the new types and functions.
  • Review the code for any potential performance or security implications related to reading power data from the device.

mvazquezc and others added 2 commits December 11, 2024 11:29
@rootfs rootfs marked this pull request as ready for review December 11, 2024 12:13
@rootfs
Copy link
Contributor Author

rootfs commented Dec 11, 2024

@KaiyiLiu1234 can you help review it? We need it asap, thanks

@mvazquezc
Copy link
Contributor

I did some small changes to @rootfs branch here rootfs#4

Now I can query the metrics endpoint and this is what I get:

kepler_node_core_joules_total{instance="rhelarm",mode="dynamic",package="0",source="grace-acpi"} 0.139
kepler_node_dram_joules_total{instance="rhelarm",mode="dynamic",package="0",source="grace-acpi"} 0
kepler_node_gpu_joules_total{instance="rhelarm",mode="dynamic",package="0",source="GRACE HOPPER"} 854620.993
kepler_node_info{components_power_source="grace-acpi",cpu_architecture="unknown",platform_power_source="acpi",source="os"} 1
kepler_node_other_joules_total{instance="rhelarm",mode="dynamic",package="socket0",source="grace-acpi"} 116.781
kepler_node_package_joules_total{instance="rhelarm",mode="dynamic",package="0",source="grace-acpi"} 1070.754
kepler_node_platform_joules_total{instance="rhelarm",mode="dynamic",package="energy1",source="acpi"} 195016.869
kepler_node_uncore_joules_total{instance="rhelarm",mode="dynamic",package="0",source="grace-acpi"} 0

@rootfs rootfs changed the title [WIP] feat(sensor): support NVIDIA Grace Hopper feat(sensor): support NVIDIA Grace Hopper Dec 11, 2024
Copy link
Collaborator

@KaiyiLiu1234 KaiyiLiu1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Signed-off-by: Huamin Chen <[email protected]>
@rootfs rootfs merged commit 20979b2 into sustainable-computing-io:main Dec 11, 2024
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants