Skip to content

Conversation

@hhzhang16
Copy link
Contributor

@hhzhang16 hhzhang16 commented Nov 13, 2025

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced profiling operations security and reliability by correcting file system permission settings. This ensures profiling jobs have proper access to required resources and reduces potential permission-related failures.
  • Tests

    • Added test case to verify profiling job security context configurations are correctly applied.

@hhzhang16 hhzhang16 requested a review from a team as a code owner November 13, 2025 12:13
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 13, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 13, 2025

Walkthrough

The changes add fsGroup security context configuration to the profiling job's PodSpec in the DynamoGraph deployment controller, setting it to 0 to ensure correct permissions for the profiler runtime. A corresponding test case validates this security context configuration.

Changes

Cohort / File(s) Summary
Security Context Configuration
deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller.go
Introduces a local fsGroup variable set to 0 and applies it via SecurityContext FSGroup in the profiling Job's PodSpec to ensure correct volume permissions for the profiler runtime.
Test Coverage
deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller_test.go
Adds a new test case under DGDR Profiler Arguments to verify that the profiling job's pod security context includes fsGroup set to 0, following existing test patterns.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Changes are minor and localized to security context configuration
  • Test case follows established patterns in the test suite
  • No complex logic changes or behavioral modifications

Poem

🐰 A bunny hops with glee today,
FSGroups set in the safest way,
Permissions locked at zero-bound,
Security through and through found!

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description contains only the template structure with all sections empty, providing no implementation details, rationale, or context for the changes. Fill in all template sections: Overview (what problem this solves), Details (specific changes made), Where should the reviewer start (files to review), and Related Issues (with actual issue number instead of #xxx).
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add fsgroup to ProfilingJob' clearly and concisely describes the main change—adding fsGroup configuration to the profiling job, which matches the actual implementation changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f817c59 and 7a79f40.

📒 Files selected for processing (2)
  • deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller.go (2 hunks)
  • deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1302-1306
Timestamp: 2025-06-11T21:18:00.425Z
Learning: In the Dynamo operator, the project’s preferred security posture is to set a Pod-level `PodSecurityContext` with `runAsUser`, `runAsGroup`, and `fsGroup` all set to `1000`, and then selectively override the user at the individual container level (e.g., `RunAsUser: 0` for Kaniko) when root is required.
📚 Learning: 2025-06-11T21:18:00.425Z
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1302-1306
Timestamp: 2025-06-11T21:18:00.425Z
Learning: In the Dynamo operator, the project’s preferred security posture is to set a Pod-level `PodSecurityContext` with `runAsUser`, `runAsGroup`, and `fsGroup` all set to `1000`, and then selectively override the user at the individual container level (e.g., `RunAsUser: 0` for Kaniko) when root is required.

Applied to files:

  • deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller_test.go
  • deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller.go
🧬 Code graph analysis (1)
deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller_test.go (2)
deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller.go (1)
  • ServiceAccountProfilingJob (102-102)
deploy/cloud/operator/api/v1alpha1/dynamographdeploymentrequest_types.go (3)
  • DynamoGraphDeploymentRequest (229-238)
  • DynamoGraphDeploymentRequestSpec (104-144)
  • ProfilingConfigSpec (50-69)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller.go (2)

1148-1150: Implementation is correct for the chosen fsGroup value.

The SecurityContext is properly applied to the PodSpec. However, ensure the fsGroup value aligns with the project's security requirements (see comment on lines 1128-1130).


1128-1130: Verify whether profiler security context should follow the standard Pod-level pattern or if it justifies an exception.

The profiling job's security context sets only FSGroup: 0 at the Pod level with no runAsUser or runAsGroup, which deviates from the documented security posture. However, the codebase currently contains no other PodSecurityContext implementations to reference for comparison—the profiling job is the only instance.

Additionally, the comment states "runtime images run as user 'dynamo' (UID 1000, GID 0)," yet doesn't set runAsUser: 1000 at the Pod level, which appears inconsistent with the described pattern.

The configuration is intentional (validated in tests), but without access to the reference implementation pattern, please confirm:

  • Is the profiler workload exempt from the standard security posture?
  • If it should follow the pattern, should it set Pod-level runAsUser: 1000, runAsGroup: 1000, fsGroup: 1000 with selective container-level overrides?
deploy/cloud/operator/internal/controller/dynamographdeploymentrequest_controller_test.go (1)

1096-1157: Good test coverage for fsGroup configuration.

The test case is well-structured and properly validates that the fsGroup is set in the pod security context. It follows the existing test patterns and includes proper resource cleanup.

Note: The test validates fsGroup=0, which aligns with the implementation but may need adjustment if the security posture concern raised in the controller file review is addressed.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@grahamking
Copy link
Contributor

Could you clarify the title a little, maybe add a short description? What is fsgroup and what is ProfilingJob? Ourselves in the future will be grateful! :-)

@hhzhang16
Copy link
Contributor Author

/ok to test c6a7aeb

@hhzhang16 hhzhang16 enabled auto-merge (squash) November 13, 2025 23:03
@hhzhang16
Copy link
Contributor Author

/ok to test feb3395

@hhzhang16 hhzhang16 merged commit 436f08d into main Nov 14, 2025
25 of 39 checks passed
@hhzhang16 hhzhang16 deleted the bug-5655352-fix branch November 14, 2025 11:08
hhzhang16 added a commit that referenced this pull request Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants