You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the hardware world, there is a concept of a watchdog. The rest of the hardware or software has to periodically state that is operating properly. If the hardware or software does not do this, then the watchdog kicks in and does some operation to get things running again. This could be rebooting the machine or restarting the software.
In the software monitoring world for servers, the monitor will call some health check API and the server will reply with a healthy or sick. If the server responds with sick, then an alarm goes off. In Kubernetes, the server is quiesced (i.e., no new requests) and eventually shutdown while another server is started in the cluster.
These are large watchdogs that affect the entire program. I would like a watchdog at a task level. This is useful for a long running task. As the task runs, it reports back to the watchdog that it is still making progress. If the task does not report back within the timeout, then the watchdog times out.
public class WatchDogExecutor
{
private static final ThreadLocal<List<WatchDogExecutor>> EXECUTORS;
private volatile Instant m_lastHeartbeat;
public static void heartbeat()
{
EXECUTORS.
get().
forEach(WatchDogExecutor::updateHeartbeat);
// If there are no WatchDogExecutors, do we want to throw an IllegalStateException? Maybe or maybe not. I think this needs to be something the programmer can opt-in or opt-out.
}
private void updateHeartbeat()
{
m_lastHeartbeat = Instant.now();
}
}
The above WatchDogExecutor tracks what thread is currently executing via modifying EXECUTORS (not shown). As a task is running it needs to call WatchDogExecutor.heartbeat() before the configured timeout (not shown). Another background repeating task needs to go through all the WatchDogExecutors and enforce any time outs.
Why not just use a Timeout? Because the task is very long running, this timeout might be several minutes or longer. With a watch dog, the task can timeout much sooner if it hangs from the very beginning.
I'll admit I have only needed this kind of watch dog a few times in my career. However, having the functionality readily available will encourage greater use.
The text was updated successfully, but these errors were encountered:
The only problem I see: If you have such a rogue task, what you can do with it? You can't actually stop execution in Java. Thread.stop is deprecated, Thread.interrupt is only a hint. In the CompletableFuture they have understood it themselves, and you can't really cancel the CompletableFuture :-)
@magicprinc You can't stop its execution. But, you can set a flag to tell it to quit. You can set a flag to invalidate any result it produces. You can log an error. You tell the user the operation timed out. The world is your playground of how to respond.
In the hardware world, there is a concept of a watchdog. The rest of the hardware or software has to periodically state that is operating properly. If the hardware or software does not do this, then the watchdog kicks in and does some operation to get things running again. This could be rebooting the machine or restarting the software.
In the software monitoring world for servers, the monitor will call some health check API and the server will reply with a healthy or sick. If the server responds with sick, then an alarm goes off. In Kubernetes, the server is quiesced (i.e., no new requests) and eventually shutdown while another server is started in the cluster.
These are large watchdogs that affect the entire program. I would like a watchdog at a task level. This is useful for a long running task. As the task runs, it reports back to the watchdog that it is still making progress. If the task does not report back within the timeout, then the watchdog times out.
The above
WatchDogExecutor
tracks what thread is currently executing via modifyingEXECUTORS
(not shown). As a task is running it needs to callWatchDogExecutor.heartbeat()
before the configured timeout (not shown). Another background repeating task needs to go through all theWatchDogExecutors
and enforce any time outs.Why not just use a
Timeout
? Because the task is very long running, this timeout might be several minutes or longer. With a watch dog, the task can timeout much sooner if it hangs from the very beginning.I'll admit I have only needed this kind of watch dog a few times in my career. However, having the functionality readily available will encourage greater use.
The text was updated successfully, but these errors were encountered: