-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add resource journal #6586
Open
garlick
wants to merge
7
commits into
flux-framework:master
Choose a base branch
from
garlick:resource_journal
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
add resource journal #6586
+387
−19
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Problem: the reslog class has no way to access the resource inventory, but it will be useful to send journal consumers a copy of R when the resource-define event is emitted. Pass the resource_ctx to reslog_create() instead of just the flux_t handle. Adjust internal uses of the flux_t handle to get it via reslog->ctx->h instead of reslog->h.
Problem: the full resource eventlog, including online/offline events that are not committed to the KVS, may need to be monitored. Keep events in a json array in memory, including the events that were read from the KVS at startup, if any. Filter out any historical resource-define events. These are meant for synchronization on the availability of R and that only pertains to the current instance.
garlick
force-pushed
the
resource_journal
branch
2 times, most recently
from
January 30, 2025 22:04
46bf4a3
to
5e12001
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6586 +/- ##
==========================================
- Coverage 79.47% 79.44% -0.04%
==========================================
Files 531 531
Lines 88433 88588 +155
==========================================
+ Hits 70282 70376 +94
- Misses 18151 18212 +61
|
Problem: there is no way to observe the journal in real time, with non-persistent online/offline events included. Add a resource.journal RPC with protocol similar to the job manager journal.
Problem: a resource journal consumer will get online/offline events before knowing the size of the instance or the hostname mapping. Post a 'restart' event when the resource module is loaded with the following keys: ranks An idset containing all valid ranks: 0 to size-1 online An idset containing any ranks that are initially online. This is normally empty except when starting with monitor-force-up in test. nodelist Contents of the hostlist broker attribute This event is not made persistent in the KVS resource.eventlog.
Problem: there is no convenient tool for accessing the resource journal. Add flux resource eventlog [--follow] [--wait=EVENT].
Problem: there are no tests for the resource journal or the flux resource eventlog command. Add a sharness test for this purpose.
Problem: flux resource eventlog has no documentation. Add an entry to the man page.
garlick
force-pushed
the
resource_journal
branch
from
January 31, 2025 00:48
5e12001
to
3ff612f
Compare
I added a quick |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds a resource journal streaming RPC similar the one offered by the job manager. Just a first cut at this point.
This doesn't change what's posted to the persistent
resource.eventlog
in the KVS but does add one new event calledrestart
that's only for journal consumption. It provides a baseline for mapping execution targets to hostnames in the current instance, and sets the initialonline
set after a restart.Unlike the job manager journal, this doesn't have as much volume to deal with so no options for event filtering or skipping historical data are provided as yet.
flux resource eventlog
can be used to dump and optionally follow this log. This is not currently polished at all - it just dumps the events in JSON form, one per line.For more detail on what's in this log and how the journal is formatted, see the proposed RFC: