Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unwrap ErrorPolykeyRemote when reporting errors #107

Open
tegefaulkes opened this issue Jan 29, 2024 · 13 comments
Open

Unwrap ErrorPolykeyRemote when reporting errors #107

tegefaulkes opened this issue Jan 29, 2024 · 13 comments
Assignees
Labels
development Standard development r&d:polykey:supporting activity Supporting core activity

Comments

@tegefaulkes
Copy link
Contributor

tegefaulkes commented Jan 29, 2024

Specification

When an RPC call fails it throws ErrorPolykeyRemote with the true error as the cause. This is very noisy when printed by the CLI command and is frankly confusing at times. When reporting these errors we want to unwrap the ErrorPolykeyRemote and just report the actual cause of the error.

Additional context

Tasks

  1. Unwrap the ErrorPolykeyRemote error and just report the cause when reporting the RPC errors.
@tegefaulkes tegefaulkes added the development Standard development label Jan 29, 2024
@CMCDragonkai CMCDragonkai added the r&d:polykey:supporting activity Supporting core activity label Aug 13, 2024
Copy link
Member

CMCDragonkai commented Aug 22, 2024

Only the commands that are performing remote calls should unwrap it. This is because they know the reason why, the reason is due to the remote side failing. However there's a caveat.

This is what it looks like when there's a remote side error:

ErrorPolykeyRemote: Remote error from RPC call
  localHost    ::1
  localPort    45402
  remoteHost    ::1
  remotePort    33107
  command    vaultsSecretsGet
  timestamp    Thu Aug 22 2024 20:37:34 GMT+1000 (Australian Eastern Standard Time)
  cause: ErrorSecretsSecretUndefined: Secret does not exist - Secret with name: customer_nubmer does not exist
    cause: {"type":"ErrorEncryptedFSError","data":{"message":"ENOENT: no such file or directory, customer_nubmer","timestamp":"2024-08-22T10:37:34.634Z","data":{},"stack":"ErrorEncryptedFSError: ENOENT: no such file or directory, customer_nubmer\n    at f._open (/home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2158:117625)\n    at async /home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2158:115345\n    at async Object.maybeCallback (/home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2155:21076)\n    at async /home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2158:120774\n    at async Object.maybeCallback (/home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2155:21076)\n    at async /home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2283:135663\n    at async /home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2246:8712\n    at async withF (/home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:7:9819)\n    at async Object.getSecret (/home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2283:135640)\n    at async /home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2283:142996","_errno":34,"_code":"ENOENT","_description":"no such file or directory","_syscall":"open"}}

You can see here that it's a bit verbose firstly with alot of metadata. This is being discussed in ENG-119 or #17. Additionally the error formatting isn't properly being done recursively. There's another cause after the first cause, and that's just JSON now.

Now the main problem here is how do we properly communicate that the error is coming from the remote side, and it's not an error of the local side. This is where the command that's receiving this error needs to catch and interrogate it. I can imagine that if we can reduce verbosity, it would look like:

ErrorPolykeyRemote: Remote error from RPC call
  cause: ErrorSecretsSecretUndefined: Secret does not exist - Secret with name: customer_nubmer does not exist
    cause: ErrorEncryptedFSError: ENOENT: no such file or directory, customer_nubmer

Yet I still don't like this. There's an old issue I cannot find anymore that talked flipping this chain order. Because the error that I do care about is actually the last error.

If the errors are really long, it obviously favours the last message, but if they are short, it's clearer to write down the actual error, and the wrapper.

ErrorPolykeyRemote: Remote error from RPC call
  cause: ErrorSecretsSecretUndefined: Secret does not exist - Secret with name: customer_nubmer does not exist
    cause: ErrorEncryptedFSError: ENOENT: no such file or directory, customer_nubmer

But then there's also the fact that one has to be discerning about the level of errors that's relevant.

Only the command understands the context. Which is to say, it knows that ErrorSecretsSecretUndefined is what really matters. In which case, it should drop irrelevant information assuming low verbosity error reporting and say:

ErrorEncryptedFSError: ENOENT: no such file or directory, customer_nubmer
  caused: ErrorSecretsSecretUndefined: Secret does not exist - Secret with name: customer_nubmer does not exist
    caused: ErrorPolykeyRemote: Remote error from RPC call 

This would be an interesting way of reporting it because it's not exactly how the structure of the error objects themselves are setup. You have to invert the chain to report it. It could be confusing if later we increase error verbosity and we end up reversing it again back to the normal object chain in order to show all the error object properties properly. It also wouldn't match the JSON version of errors too.

Finally the command also has enough context to to specifically understand that ErrorSecretsSecretUndefined is sufficient to explain the problem. And thus drop the ErrorEncryptedFSError entirely when reporting. However that error object still exists. So this means the command itself when returning the error object, should provide a "index" tag back to the reporter for it to know what error message to focus on. So I would like to see something like:

ErrorPolykeyRemote: Remote error from RPC call
  cause: ErrorSecretsSecretUndefined: Secret does not exist - Secret with name: customer_nubmer does not exist

Additionally we may drop descriptions in favour of messages if the message is sufficient. We may keep the description in case the message doesn't exist. This makes the description a sort of "default message". Thus giving us:

ErrorPolykeyRemote: Remote error from RPC call
  cause: ErrorSecretsSecretUndefined: Secret with name: customer_nubmer does not exist

Now why do we keep the Error… class name? That's because the class name itself is in fact the error code that can be referenced as they are unique global names in the Polykey ecosystem. Kind of like how in Windows they do a thing like error code 14598374, and you can give that to the devs to debug. However since this is progammatic information, it should not necessarily be featured first.

Remote error from RPC call
  name: ErrorPolykeyRemote
  cause: Secret with name: customer_nubmer does not exist
    name:  ErrorSecretsSecretUndefined

I still don't quite like this. I think instead the "code" in this case being the unique error class name, should actually be on the same line. All of our errors should be easily explained in one line (or can they…?)

Remote error from RPC call : ErrorPolykeyRemote
  cause: Secret with name: customer_nubmer does not exist : ErrorSecretsSecretUndefined

The : here is space separated unlike before. And we could also use | to be clearer.

While the inverted chain may be better in just understanding, the fact that it ends up being flipped when it is actually verbose I think is a bad trade off.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Aug 22, 2024

Remote error from RPC call | ErrorPolykeyRemote
  cause: Secret with name: customer_nubmer does not exist | ErrorSecretsSecretUndefined

Remote error from RPC call - ErrorPolykeyRemote
  cause: Secret with name: customer_nubmer does not exist - ErrorSecretsSecretUndefined

I do prefer the dash.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Aug 22, 2024

So that would mean we don't technically unwrap it, because we do still need to indicate whether it is a local error or a remote error.

Inversion is too confusing when the high verbosity output will be in the regular format.

So the only thing really here is verbosity itself, and changing it so that the error class name is at the very end of the message. This assumes all error messages can be explained in one short line, and that messages override descriptions when reported, so that descriptions is acting as the default error message.

What if the error message is far longer, and has to take up multiple lines?

Some error and error and error, sdfds.
And this is because of this reason and that reason.

And this paragraph exists too.
- ErrorPolykeySomethingMultiLineErrorMessage
  cause: Some other reason blah blah blah
         indented multiline message of course!
         - ErrorPolykeySomething

@aryanjassal
Copy link
Member

I personally really like the python way of handling errors, and I believe it would be easier to read for non-technical users. Something like this might work:

Traceback (most recent call last):
  File "/home/aryanj/Downloads/test.py", line 14, in <module>
    fn4()
  File "/home/aryanj/Downloads/test.py", line 11, in fn4
    fn3()
  File "/home/aryanj/Downloads/test.py", line 8, in fn3
    fn2()
  File "/home/aryanj/Downloads/test.py", line 5, in fn2
    fn1()
  File "/home/aryanj/Downloads/test.py", line 2, in fn1
    raise AttributeError
AttributeError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/aryanj/Downloads/test.py", line 16, in <module>
    raise ValueError
ValueError

I just wrote this test script which raises two errors. Here, not much nesting or indentation is seen, but all the information is provided to the reader. Of course, we might not have the details like the file name or line number, but that can be replaced with the error that was called, etc.

For example, this is how I currently do error displaying in #255

[aryanj@matrix-34xx]$ pk secrets ls newvault:/doesntexist

ErrorPolykeyCLIFileRead: Failed to read from filesystem: Failed to read directory: /doesntexist

I did this by catching the thrown remote error, and throwing the unwrapped error with the same details without any additional metadata. I will remove this before merging the PR, but this made it much easier to read and understand what is going on and what went wrong during development.

Copy link
Member

I'm very aware of how python does their errors. Make sure you read my post to the end! Our cause chain is a more streamlined version of python traces.

@CMCDragonkai
Copy link
Member

My favourite is this atm.

Remote error from RPC call - ErrorPolykeyRemote
  cause: Secret with name: customer_nubmer does not exist - ErrorSecretsSecretUndefined

But read what I wrote earlier as it explains that iteration.

@CMCDragonkai
Copy link
Member

Technically pythons report is actually the inverted version of ours.

In a way it's doing a stack trace which is inverted.

The object error chain goes outside in, while the stack trace goes inside out.

Copy link
Member

This is kinda related to MatrixAI/js-errors#21, so I might take over this too.

How do we want to deal with this? Is there a format or standard that we have decided upon?

My favourite is this atm.

This changes the way we currently format errors. I've made some comments on how we can update the formatting of errors too. MatrixAI/js-errors#21 (comment)

@aryanjassal aryanjassal self-assigned this Dec 10, 2024
@CMCDragonkai
Copy link
Member

Can you read my comment and summarise the tradeoffs and preferred style?

Copy link
Member

Error Reporting Formats: Summary and Analysis

1. Verbose Nested Format

Example:

ErrorPolykeyRemote: Remote error from RPC call
  localHost ::1
  localPort 45402
  remoteHost ::1
  remotePort 33107
  command vaultsSecretsGet
  timestamp Thu Aug 22 2024 20:37:34 GMT+1000 (Australian Eastern Standard Time)
  cause: ErrorSecretsSecretUndefined: Secret does not exist - Secret with name: customer_nubmer does not exist
    cause: {"type":"ErrorEncryptedFSError","data":{"message":"ENOENT: no such file or directory, customer_nubmer","timestamp":"2024-08-22T10:37:34.634Z","data":{},"stack":"..."}}

Pros:

  • Highly detailed with all relevant metadata.
  • Useful for debugging as it provides information about the remote context, timestamps, and the full chain of causes.

Cons:

  • Overly verbose for end users or high-level developers; requires effort to parse.
  • Noise from excessive metadata can obfuscate the root cause of the error.
  • Not suitable for low-verbosity error reporting.

2. Simplified Chain Format

Example:

ErrorPolykeyRemote: Remote error from RPC call
  cause: ErrorSecretsSecretUndefined: Secret does not exist - Secret with name: customer_nubmer does not exist
    cause: ErrorEncryptedFSError: ENOENT: no such file or directory, customer_nubmer

Pros:

  • Reduces verbosity while retaining key information.
  • Clear separation of root cause and context.
  • Easier to read for developers who need actionable insights.

Cons:

  • Still retains nesting, which can be confusing for non-technical users.
  • Does not eliminate redundant metadata entirely.
  • May still include irrelevant details for specific audiences.

3. Inverted Chain Format

Example:

ErrorEncryptedFSError: ENOENT: no such file or directory, customer_nubmer
  caused: ErrorSecretsSecretUndefined: Secret does not exist - Secret with name: customer_nubmer does not exist
    caused: ErrorPolykeyRemote: Remote error from RPC call

Pros:

  • Highlights the most relevant (leaf-level) error first.
  • Useful for quickly identifying the actionable error cause.
  • Intuitive for short error chains.

Cons:

  • Inconsistent with standard nesting conventions (e.g., JSON or stack traces).
  • Reversing the chain for low verbosity but restoring it for high verbosity adds complexity.
  • Difficult to align with existing error object structures.

4. Reduced, Context-Aware Format

Example:

ErrorPolykeyRemote: Remote error from RPC call
  cause: ErrorSecretsSecretUndefined: Secret does not exist - Secret with name: customer_nubmer does not exist

Pros:

  • Minimal and focused on the most relevant error.
  • Drops irrelevant metadata, improving readability for end users.
  • Customizable verbosity allows commands to filter out unnecessary details dynamically.

Cons:

  • Relies on the command to "understand" the most relevant error, adding responsibility to the caller.
  • May omit useful debugging context if misconfigured.
  • Might not align well with automated systems parsing logs.

5. Flat One-Line Format

Example:

Remote error from RPC call : ErrorPolykeyRemote
  cause: Secret with name: customer_nubmer does not exist : ErrorSecretsSecretUndefined

Pros:

  • Concise and ideal for scenarios where brevity is valued.
  • Maintains readability by using inline formatting.
  • Useful for logs or CLI outputs targeting general users.

Cons:

  • Sacrifices detail for brevity; harder to debug complex issues.
  • Inline metadata might still confuse non-technical users.
  • Not suitable for deeply nested error chains.

6. Python-Style Traceback Format

Example:

Traceback (most recent call last):
  File "polykey.js", line 2158, in getSecret
    raise ErrorEncryptedFSError: ENOENT: no such file or directory, customer_nubmer
ErrorSecretsSecretUndefined: Secret does not exist - Secret with name: customer_nubmer
ErrorPolykeyRemote: Remote error from RPC call

Pros:

  • Familiar format for developers, especially in Python-like ecosystems.
  • Provides full traceback for debugging while focusing on the root error.
  • Clear separation of errors in a readable stack structure.

Cons:

  • Not intuitive for non-technical users.
  • Requires file/line references or stack trace context, which might not always be available.
  • Can be overly verbose for CLI use.

7. Minimalist, Human-Friendly Format

Example:

Remote error from RPC call - ErrorPolykeyRemote
  cause: Secret with name: customer_nubmer does not exist - ErrorSecretsSecretUndefined

Pros:

  • Balances readability and context for non-technical users.
  • Focuses on the error message while retaining minimal metadata for debugging.
  • Ideal for end-user applications like CLI outputs.

Cons:

  • Omits deeper context, making debugging harder for developers.
  • Limited applicability in scenarios requiring detailed error logs.
  • Risks oversimplification in complex error chains.

Recommendations

  • For Developers: Use Python-Style Traceback Format or Verbose Nested Format during development for rich debugging information.
  • For End Users: Opt for Minimalist, Human-Friendly Format or Flat One-Line Format to prioritize readability.
  • For Dynamic Use: Employ Reduced, Context-Aware Format with adjustable verbosity to strike a balance based on the audience.
  • Avoid Inversion: The Inverted Chain Format is unconventional and risks introducing inconsistency across verbosity levels.

Updated Error Reporting: Conciseness for End Users vs. Verbosity for Logs

The suggestion aligns with creating a dual-layered error reporting system:

  1. User-Facing Errors: Concise, clear, and actionable.
  2. Log-Facing Errors: Verbose, detailed, and developer-oriented.

Comparison of Proposed Error Formats

1. Current Verbose Nested Format Example:

ErrorPolykeyRemote: Remote error from RPC call
  localHost	::1
  localPort	59558
  remoteHost	::1
  remotePort	43305
  command	vaultsSecretsEnv
  timestamp	Fri Dec 06 2024 17:50:40 GMT+1100 (Australian Eastern Daylight Time)
  cause: ErrorVaultsVaultUndefined: Vault does not exist

Pros:

  • Comprehensive metadata for debugging.
  • Ideal for developers working on fixing backend issues.

Cons:

  • Overwhelming for end users, who may only need a short explanation.
  • Redundant fields in simple cases like "Vault does not exist."
  • Poor readability in CLI or UI contexts.

2. Concise User-Facing Format Example:

ErrorVaultsVaultUndefined: Vault does not exist

Pros:

  • Direct and easy for end users to understand.
  • Only shows actionable information relevant to the user.

Cons:

  • Strips away metadata required for debugging.
  • Developers need separate access to logs to trace issues.

3. Enhanced Concise Format with Cause Example:

ErrorPolykeyRemote: Vault does not exist (vaultsSecretsEnv RPC call failed)

Pros:

  • Combines clarity with a hint of context (e.g., where the error occurred).
  • Suitable for power users who want a bit more detail.

Cons:

  • Adds slight verbosity, which might still be unnecessary for beginners.
  • Hard to scale for deeply nested errors without increasing complexity.

4. Detailed Error Chain for Power Users Example:

ErrorPolykeyRemote: Something bad happened (connection lost)
  causes: 
    ErrorEFS: Cannot write to directory (attempt to write ./???/)
    ErrorTimeout: Connection to EFS timed out

Pros:

  • Useful for intermediate users who want insight without diving into full logs.
  • Strikes a balance between verbosity and conciseness.

Cons:

  • Can still overwhelm beginners with excessive detail.
  • Requires careful design to decide when and how to show nested causes.

5. Dual-Layered Approach User-Facing Error:

ErrorVaultsVaultUndefined: Vault does not exist

Log-Facing Error:

ErrorPolykeyRemote: Remote error from RPC call
  localHost	::1
  localPort	59558
  remoteHost	::1
  remotePort	43305
  command	vaultsSecretsEnv
  timestamp	Fri Dec 06 2024 17:50:40 GMT+1100 (Australian Eastern Daylight Time)
  cause: ErrorVaultsVaultUndefined: Vault does not exist
    cause: ErrorEFS: Cannot write to directory (attempt to write ./???/)

Pros:

  • Highly adaptable to different audiences.
  • Users see only what they need; developers have detailed logs for debugging.
  • Reduces clutter in CLI/UI while retaining full traceability.

Cons:

  • Requires dual implementation: concise rendering for users, verbose logging for backend.
  • Adds complexity to the error-reporting system.

Updated Recommendations

  1. Adopt a Dual-Layered Error Reporting System:
    • User-Facing Errors:
      • Default to single-line errors where possible.
      • Include brief context for slightly more complex errors (e.g., RPC failure).
      • Avoid technical jargon, stack traces, or redundant metadata.
    • Log-Facing Errors:
      • Preserve full verbosity and detailed trace information for debugging.
      • Include the full chain of causes, metadata, and timestamps.
  2. Use Context-Aware Error Filtering:
    • Commands should discern what part of the error is relevant to the user.
    • For example, hide irrelevant layers like ErrorEFS when ErrorVaultsVaultUndefined explains the issue sufficiently.
  3. Implement Verbosity Levels:
    • Allow users to toggle verbosity (--verbose or --debug flags in CLI tools).
    • At low verbosity, display the first relevant error with minimal context.
    • At high verbosity, show the complete error chain in developer-friendly format.
  4. Standardize Formatting for User-Facing Errors:
    • Use a clear, consistent template:

      ErrorName: Description (Message)
      
    • Optionally, append context in parentheses for clarity:

      ErrorPolykeyRemote: Vault does not exist (vaultsSecretsEnv RPC call failed)
      
  5. Leverage Logs for Detailed Debugging:
    • Redirect full stack traces and metadata to logs.
    • Users encountering issues can share logs with support teams, ensuring developers still have access to necessary context.

Final Example Output

User-Facing (Low Verbosity):

ErrorVaultsVaultUndefined: Vault does not exist

User-Facing (Moderate Verbosity):

ErrorVaultsVaultUndefined: Vault does not exist (vaultsSecretsEnv RPC call failed)

Log-Facing (High Verbosity):

ErrorPolykeyRemote: Remote error from RPC call
  localHost	::1
  localPort	59558
  remoteHost	::1
  remotePort	43305
  command	vaultsSecretsEnv
  timestamp	Fri Dec 06 2024 17:50:40 GMT+1100 (Australian Eastern Daylight Time)
  cause: ErrorVaultsVaultUndefined: Vault does not exist
    cause: ErrorEFS: Cannot write to directory (attempt to write ./???/)

This approach ensures readability, usability, and traceability across diverse user roles and scenarios.

Copy link
Member

I still think that the errors on the front-facing CLI shouldn't have too much information, and concise one-liners are the way to go here to inform the user. These messages can be printed to the logs in full verbosity like we currently see.

@CMCDragonkai
Copy link
Member

There are 2 slightly conflicting goals here. One is a useful error message to the end user so they know how to CORRECT their "usage" of the program. Another is a useful error message for developers to understand how to CORRECT the behaviour of the program.

This means, when we report errors on the CLI, we really want a system that can do both, because when an error occurs, it could occur for both reasons, it's hard to know. And when the user needs to submit a but report, it's important for them to be able to copy the error output and provide that information.

In order to achieve this, we need to distinguish "usage" errors vs non-usage errors.

We actually have this already in the sysexit codes. However those are defined on an exception basis, whereas what is considered a user error and what is considered a behaviour error sometimes depends on the command itself.

So we can have a "base" distinction on the sysexit, and then a more derivative distinction occur be dispatched on the command handler itself.

What this means is that at the UI level, usage errors should be simple and to the point. But behaviour errors need to be explicit and detailed.

Copy link
Member

I can push up a quick fix which strips the ErrorPolykeyRemote wrapper and only displays the cause of it with the current rendering systems. This would at least clean up the output while we discuss the optimal way of dealing with this issue. From my experience, the only relevant information in the wrapper is the command which caused the failure, but that can be stuffed in the data field of the cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:supporting activity Supporting core activity
Development

No branches or pull requests

3 participants