-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable memory monitoring in CS #391
Conversation
6ca4484
to
8f89e86
Compare
d9dd1d9
to
798579f
Compare
launcher/container_runner.go
Outdated
} | ||
r.logger.Println("node-problem-detector.service successfully started.") | ||
} else { | ||
r.logger.Println("node-problem-detector.service disabled.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"MemoryMonitoring is disabled" to be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// a problem daemon in node-problem-detector that collects pre-defined health-related metrics from different system components. | ||
// For now we only consider collecting memory related metrics. | ||
// View the comprehensive configuration details on https://github.com/kubernetes/node-problem-detector/tree/master/pkg/systemstatsmonitor#detailed-configuration-options | ||
type SystemStatsConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this configuration here, I thought system-stats-monitor-cs.json
already have the needed configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, your understanding is right - this systemstats_config.go
is not used anywhere for now, but the value of having this is for future proof: if customers want to enable other monitoring options in the future (cpu, disk, os etc.,), this file will allow them to override the system-stats-monitor.json
on the run time instead of image build time.
progress := make(chan string, 1) | ||
|
||
// Run systemd command in "replace" mode to start the unit and its dependencies, | ||
// possibly replacing already queued jobs that conflict w∏ith this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra character "w∏ith"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
798579f
to
f002347
Compare
@jkl73 Added three another config files in addition to |
return fmt.Errorf("failed to run systemctl [%s] for unit [%s]: %v", cmd, unit, err) | ||
} | ||
|
||
if result := <-progress; result != "done" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will block, do you know the timeout for those dbus function?
Also probably should log something in container_runner.go to indicate it is trying to do some systemctl operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbus will call the underlying C library. You can check the timeout definitions at https://dbus.freedesktop.org/doc/api/html/group__DBusTimeout.html. The dbus config file usually locates at /usr/share/dbus-1/system.conf
with a default timeout 25s.
Added systemctl loggings to container_runner.go.
f002347
to
f08ba4b
Compare
New Features: [launcher] Add TEE server IPC implementation google#367 [launcher] Enable memory monitoring in CS google#391 Use TDX quote provider to attest and verify google#405 Integrate nonce verification as part of the TDX quote validation procedure. google#395 Add RISC V support google#407 [launcher] Use resizable integrity-fs with in-memory tags google#412 Bug Fixes: [launcher] Fix launcher exit code google#384 [launcher] Handle exit code checking during deferral evaluation google#392 [cmd] Skip tests that call setGCEAKTemplate google#402 [launcher] Fix teeserver context reset issue & add container signature cache google#397 Set all unused parameters as _ to fix CI lint failure google#411 [launcher] Make customtoken test sleep to mitigate clock skew google#413 Other Changes: Add eventlog parse logics for memory monitoring google#404 [launcher]: Add memory monitor measurement logics google#408 Update go-tdx-guest version to v0.3.1 google#414 New Contributors: @KeithMoyer in google#392 @vbalain in google#405 @aimixsaka in google#407
New Features: [launcher] Add TEE server IPC implementation #367 [launcher] Enable memory monitoring in CS #391 Use TDX quote provider to attest and verify #405 Integrate nonce verification as part of the TDX quote validation procedure. #395 Add RISC V support #407 [launcher] Use resizable integrity-fs with in-memory tags #412 Bug Fixes: [launcher] Fix launcher exit code #384 [launcher] Handle exit code checking during deferral evaluation #392 [cmd] Skip tests that call setGCEAKTemplate #402 [launcher] Fix teeserver context reset issue & add container signature cache #397 Set all unused parameters as _ to fix CI lint failure #411 [launcher] Make customtoken test sleep to mitigate clock skew #413 Other Changes: Add eventlog parse logics for memory monitoring #404 [launcher]: Add memory monitor measurement logics #408 Update go-tdx-guest version to v0.3.1 #414 New Contributors: @KeithMoyer in #392 @vbalain in #405 @aimixsaka in #407
New Features: [launcher] Add TEE server IPC implementation google#367 [launcher] Enable memory monitoring in CS google#391 Use TDX quote provider to attest and verify google#405 Integrate nonce verification as part of the TDX quote validation procedure. google#395 Add RISC V support google#407 [launcher] Use resizable integrity-fs with in-memory tags google#412 Bug Fixes: [launcher] Fix launcher exit code google#384 [launcher] Handle exit code checking during deferral evaluation google#392 [cmd] Skip tests that call setGCEAKTemplate google#402 [launcher] Fix teeserver context reset issue & add container signature cache google#397 Set all unused parameters as _ to fix CI lint failure google#411 [launcher] Make customtoken test sleep to mitigate clock skew google#413 Other Changes: Add eventlog parse logics for memory monitoring google#404 [launcher]: Add memory monitor measurement logics google#408 Update go-tdx-guest version to v0.3.1 google#414 New Contributors: @KeithMoyer in google#392 @vbalain in google#405 @aimixsaka in google#407
Enable memory monitoring in CS (see go/monitoring-enablement-in-hardened-image-1-pager).
This PR includes changes to:
systemctl
to use dbus API to startnode-problem-detector.service
.tee.launch_policy.monitoring_memory_allow
to launch policy andtee-monitoring-memory-enable
to launchSpec.nodeproblemdetector
which provides methods to override node-problem-detector configurations. This library will not be used for now, but will be used in the future if operators want to opt-in more metrics monitoring options: cpu, disk, etc.