You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A system such as Derecho or Cascade should normally be completely silent, printing error messages only in extreme situations. But during a demo of Cascade that Weijia ran yesterday, we saw dozens of error messages that Weijia had to keep explaining ("ignore that, it is just because the client disconnected").
I posted a separate issue related to the least graceful shutdown sequence in the universe, yesterday -- that one arises if you have a group of 3 or 5 nodes and they all just exit without shutting the system down first. A lot of alarming messages are printed, we try to form new views, etc.
But we also need to make "silent by default" the rule for external clients connected to Derecho, and for top-level clients that exit for some reason as well. These events cause both the TCP channels to break and also the RDMA connections to break, so the issues are sensed by a variety of logic -- some owned by Edward, some by Sagar, some by Weijia. All of this code should only print messages if some form of "verbose" compile-time constant is set to true, and otherwise should be totally silent (or you can perhaps put a message in a log, but absolutely not on the console). In fact, it should be viewed as bug if the system prints a message that did not absolutely need to be printed.
Example: "Garbled log, unable to restart" -- this would be a legitimate message to print. It relates to a genuinely unusual issue.
"Error -17 on connection to node 123.65.17.221" --- this is a "bad" message to print, except when debugging.
Please view this as something important for our V2.2 release, which presumably will be in the February/March/April timeframe.
The text was updated successfully, but these errors were encountered:
A system such as Derecho or Cascade should normally be completely silent, printing error messages only in extreme situations. But during a demo of Cascade that Weijia ran yesterday, we saw dozens of error messages that Weijia had to keep explaining ("ignore that, it is just because the client disconnected").
I posted a separate issue related to the least graceful shutdown sequence in the universe, yesterday -- that one arises if you have a group of 3 or 5 nodes and they all just exit without shutting the system down first. A lot of alarming messages are printed, we try to form new views, etc.
But we also need to make "silent by default" the rule for external clients connected to Derecho, and for top-level clients that exit for some reason as well. These events cause both the TCP channels to break and also the RDMA connections to break, so the issues are sensed by a variety of logic -- some owned by Edward, some by Sagar, some by Weijia. All of this code should only print messages if some form of "verbose" compile-time constant is set to true, and otherwise should be totally silent (or you can perhaps put a message in a log, but absolutely not on the console). In fact, it should be viewed as bug if the system prints a message that did not absolutely need to be printed.
Example: "Garbled log, unable to restart" -- this would be a legitimate message to print. It relates to a genuinely unusual issue.
"Error -17 on connection to node 123.65.17.221" --- this is a "bad" message to print, except when debugging.
Please view this as something important for our V2.2 release, which presumably will be in the February/March/April timeframe.
The text was updated successfully, but these errors were encountered: