-
Notifications
You must be signed in to change notification settings - Fork 28
Add state and description codes: Part 2 #305
base: master
Are you sure you want to change the base?
Conversation
This readds the Redis keys for the Base state which where mistakenly removed from 71c173f in a rebase. The descriptionCode key references a Redis sorted-set containing the currently active subsystem states. The stateCode references a Redis string storing the (integer) value of the currently active system state (one of IDLE;WORKING;WARNING;ERROR).
This adds the handling of the following system states to the Supervisor: - DOWNLOAD_UPDATE - UPDATE_FAILED - REBOOT - SHUTDOWN The Middleware logs a destinct logtag once one of these states become active. The Supervisor watches the Middleware log and handles the state changes by setting the respective `descriptionCodes` in the sorted set in Redis and the respective stateCode. The Middleware is notified that there was a state change in Redis via the named pipe IPC, but doesn't handle the notification yet.
This adds the handling of the `INITIAL_BLOCK_SYNC` (Bitcoin Core IDB) state to the Supervisor. The Supervisor activates the subsystem state when Bitcoin Core switches to IBD and deactivates the state once Bitcoin Core finishes IBD.
This adds an extra HSM heartbeat that is send when a IPC notification with the topic "systemstate-changed" arrives over the named pipe.
aafd226
to
1ba783f
Compare
Added some needed logging to the Supervisor triggers. Reviewing should be easier now. |
After updating a running BitBoxBase, the Supervisor logs the following over and over:
Both the Grafana dashboard and the Bitcoin scraper show When
Switching back to IBD == false, the error logging continues, with a new entry every 10 seconds:
|
The following notes are out of scope for this first iteration, but @0xB10C's thoughts on how to implement would be interesting. The subsystem states An update can also be triggered from the command line from a USB drive or the maintenance menu (e.g. for a factory reset). It would be better if the The same goes for |
FYI, when I merge this PR on top of all other outstanding PRs, I get the following error message, so I guess it needs some minor adjustments after rebasing it:
|
This error usually means that there is no "idle" state in the sorted-set. Did you test on an recent (including the changes from #300) image? Otherwise you'd have to add the the Redis keys manually. See 71c173f. |
Having the Supervisor watch a systemd log with journald doesnt seem to be a good fit for this. Using a named pipe to notify the Supervisor on triggers should work. The |
Seems the key
After adding the key/value manually with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed middleware part.
This PR adds the handling of the state and description codes by the Middleware and Supervisor. This is based on the groundwork done in #300.
Requirements:
Definition
(taken from this private Google Doc)
Approach
The description codes are implemented using a Redis Sorted set. A Redis Sorted set has members in the form of
[score] [value]
. The set is sorted by the[score]
. In the implementation the[score]
is the priority of a subsystem state and the[value]
is the code of a subsystem state. A subsystem state is active if it's a member of the sorted set. The state is inactive when it's not in the sorted set. The highest priority is at the top of the sorted set, the lowest priority at the bottom of the sorted set. The Middleware queries the highest priority subsystem state and relays it to the HSM and App.Special cases:
Implementation
The Supervisor activates and deactivates description codes / subsystem states as soon as he learns that the state changes (currently by watching logs and Prometheus values). When a new state is set the Supervisor queries the highest priority description code (top element from the sorted set) and sets the state code accordingly. The Supervisor then sends notification via IPC to the Middleware that the subsystem state might have changed. The Middleware lets the HSM know about the new state.
For the subsystem states
DOWNLOADING_UPDATE
,UPDATE_FAILED
,REBOOT
andSHUTDOWN
the Middleware logs distinct log-tags that let the Supervisors log-watcher knowthat the subsystem state has changed. The
INITIAL_BLOCK_SYNC
subsystem state is set based on the Prometheus valuebitcoin_ibd
which is watched by the Prometheus-watcher of the Supervisor.TODOs out of scope for this PR