Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured formatting of errors when converted to string #21

Open
tegefaulkes opened this issue Oct 16, 2024 · 13 comments
Open

Structured formatting of errors when converted to string #21

tegefaulkes opened this issue Oct 16, 2024 · 13 comments
Assignees
Labels
development Standard development

Comments

@tegefaulkes
Copy link

Specification

In MatrixAI/Polykey#828 I added some code to format the errors in a way that included the cause chain within the error. The intention of this was to get the full context of an error. Especially when it was wrapped in a generic PolykeyRemoteError.

Additional context

Tasks

  1. Extract formatting changes from Adding better error messaging when stringifying errors in logging Polykey#828 into js-errors
  2. Refine the formatting of the error
@tegefaulkes tegefaulkes added the development Standard development label Oct 16, 2024
@tegefaulkes tegefaulkes self-assigned this Oct 16, 2024
Copy link

linear bot commented Oct 16, 2024

@tegefaulkes
Copy link
Author

before I proceed with this we need to settle with a format.

In the trivial sense I could just make the toString return JSON.stringify(this.toJson()). But the format is really annoying to read and includes the stack trace for the error. For a general warning message it's too much information. And the stack trace is not something we want to throw in the users face for a warning message.

{"type":"ErrorTest","description":"static description One","data":{"message":"","timestamp":"2024-10-17T04:45:52.735Z","data":{},"stack":"ErrorTest: \n    at Object.<anonymous> (/home/brian/workspace/js-errors/tests/index.test.ts:344:24)\n    at Promise.then.completed (/home/brian/workspace/js-errors/node_modules/jest-circus/build/utils.js:300:28)\n    at new Promise (<anonymous>)\n    at callAsyncCircusFn (/home/brian/workspace/js-errors/node_modules/jest-circus/build/utils.js:233:10)\n    at _callCircusTest (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:315:40)\n    at processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at _runTest (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:251:3)\n    at _runTestsForDescribeBlock (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:125:9)\n    at _runTestsForDescribeBlock (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:120:9)\n    at _runTestsForDescribeBlock (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:120:9)\n    at run (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:70:3)\n    at runAndTransformResultsToJestFormat (/home/brian/workspace/js-errors/node_modules/jest-circus/build/legacy-code-todo-rewrite/jestAdapterInit.js:122:21)\n    at jestAdapter (/home/brian/workspace/js-errors/node_modules/jest-circus/build/legacy-code-todo-rewrite/jestAdapter.js:79:19)\n    at runTestInternal (/home/brian/workspace/js-errors/node_modules/jest-runner/build/runTest.js:367:16)\n    at runTest (/home/brian/workspace/js-errors/node_modules/jest-runner/build/runTest.js:444:34)"}}

// It gets even worse with a cause chain.
{"type":"ErrorTest","description":"static description One","data":{"message":"With Description","timestamp":"2024-10-17T05:03:04.198Z","data":{},"cause":{"type":"ErrorTest","description":"static description One","data":{"message":"","timestamp":"2024-10-17T05:03:04.198Z","data":{},"stack":"ErrorTest: \n    at Object.<anonymous> (/home/brian/workspace/js-errors/tests/index.test.ts:344:24)\n    at Promise.then.completed (/home/brian/workspace/js-errors/node_modules/jest-circus/build/utils.js:300:28)\n    at new Promise (<anonymous>)\n    at callAsyncCircusFn (/home/brian/workspace/js-errors/node_modules/jest-circus/build/utils.js:233:10)\n    at _callCircusTest (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:315:40)\n    at processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at _runTest (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:251:3)\n    at _runTestsForDescribeBlock (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:125:9)\n    at _runTestsForDescribeBlock (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:120:9)\n    at _runTestsForDescribeBlock (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:120:9)\n    at run (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:70:3)\n    at runAndTransformResultsToJestFormat (/home/brian/workspace/js-errors/node_modules/jest-circus/build/legacy-code-todo-rewrite/jestAdapterInit.js:122:21)\n    at jestAdapter (/home/brian/workspace/js-errors/node_modules/jest-circus/build/legacy-code-todo-rewrite/jestAdapter.js:79:19)\n    at runTestInternal (/home/brian/workspace/js-errors/node_modules/jest-runner/build/runTest.js:367:16)\n    at runTest (/home/brian/workspace/js-errors/node_modules/jest-runner/build/runTest.js:444:34)"}},"stack":"ErrorTest: With Description\n    at Object.<anonymous> (/home/brian/workspace/js-errors/tests/index.test.ts:345:24)\n    at Promise.then.completed (/home/brian/workspace/js-errors/node_modules/jest-circus/build/utils.js:300:28)\n    at new Promise (<anonymous>)\n    at callAsyncCircusFn (/home/brian/workspace/js-errors/node_modules/jest-circus/build/utils.js:233:10)\n    at _callCircusTest (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:315:40)\n    at processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at _runTest (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:251:3)\n    at _runTestsForDescribeBlock (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:125:9)\n    at _runTestsForDescribeBlock (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:120:9)\n    at _runTestsForDescribeBlock (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:120:9)\n    at run (/home/brian/workspace/js-errors/node_modules/jest-circus/build/run.js:70:3)\n    at runAndTransformResultsToJestFormat (/home/brian/workspace/js-errors/node_modules/jest-circus/build/legacy-code-todo-rewrite/jestAdapterInit.js:122:21)\n    at jestAdapter (/home/brian/workspace/js-errors/node_modules/jest-circus/build/legacy-code-todo-rewrite/jestAdapter.js:79:19)\n    at runTestInternal (/home/brian/workspace/js-errors/node_modules/jest-runner/build/runTest.js:367:16)\n    at runTest (/home/brian/workspace/js-errors/node_modules/jest-runner/build/runTest.js:444:34)"}}

We have the format I made already which is a lot more concise if missing some information.

    ErrorTest()

    ErrorTest("With Description")>ErrorTest()

I think there's an inbetween where we use the JSON structure but strip out some extra information like the call stack.

@CMCDragonkai thoughts?

@CMCDragonkai
Copy link
Member

Can you review the issues here: https://github.com/MatrixAI/js-logger/issues
And MatrixAI/js-logger#3, what's where I had wrote my thoughts earlier. So I don't think it's necessary to change the message for human readable logs, although machine logs can have alot more information. Specifically when applied to errors.

@CMCDragonkai
Copy link
Member

I don't like the () in the message, that's reserved for function calls and similar things.

The thing is I guess is that we have error log and warning log that may sometimes be the result of an exception object. That error chain specified in this repo, needs to then be converted appropriately in the js-logger, especially because we are crossing boundaries from object representation to string representation.

Serialisation of the object already has a standard, that's what the JSON thing is for, and should be used for when we log out with JSON format.

Human readable representation does have a similar concept - I think we can preserve the singular message and not change that, there's a reason for this. However additional structured information could be reported with tags, and again should be part of the logging format/verbosity flags.

I say keep the messages as is. And spec out a format symbol for additional key:value information.

@tegefaulkes
Copy link
Author

Based on discussion yesterday, we're going to scrap the custom toString format. Any more complex formatting should either be handled by the logger directly by making use of the toJson function to get the data. For now better formatting of errors should be handled by whatever is logging the error. For example the TaskManager should check the kind of error it gets and construct the formatted string itself when logging the failure warning.

Copy link
Member

I will be taking this over as it is related to Polykey#824

Copy link
Member

aryanjassal commented Dec 6, 2024

Now that we are using Polykey (polykey secrets env to be exact), if the vault does not exist, we get a pretty verbose message.

ErrorPolykeyRemote: Remote error from RPC call
  localHost	::1
  localPort	59558
  remoteHost	::1
  remotePort	43305
  command	vaultsSecretsEnv
  timestamp	Fri Dec 06 2024 17:50:40 GMT+1100 (Australian Eastern Daylight Time)
  cause: ErrorVaultsVaultUndefined: Vault does not exist

This is too verbose compared what we had before, and 6/8 lines are useless in this context.

/tmp/nix-shell.NbpVCi: line 2014: ./.env: No such file or directory

While our method (the verbose one) is exceptionally useful for developers, it is unnecessarily verbose for end users. It would be similar to sharing the full traceback for the error, which could look like this for the source command.

Error: No such file or directory: ./.env
Backtrace:
   0: main
      at ./main.c:25
   1: execute_source
      at ./source_handler.c:78
   2: shell_builtin_handler
      at ./shell_builtin.c:120
   3: command_dispatch
      at ./command.c:55

So, instead of having a verbose error to aid in debugging, the error should instead be concise for the user, and be verbose in the logs. In this case, the concise error could be printed out to the user, and the verbose error (the error formatting we currently have) can be printed out in the logs. Users can attach their logs, which we could use to correlate and find the issue.

For rendering, instead of trying to show multiple levels of error, we can just show the topmost error (in case of ErrorPolykeyRemote and similar errors, just show the first non-remote error) and show the entire trace in the logs.

Alternatively, if we need to show some errors to the user, they can be rendered as such. Do note that I have changed the error formatting from ErrorName: description - message to ErrorName: description (message). Personally, I prefer the following errors much more than what we currently have facing the end users. I would also recommend hiding the full cause chain for most errors, as most are quite irrelevant to the user and concern the implementation itself.

ErrorPolykeyBad: Something bad happened (who tried dividing by zero?!)

ErrorPolykeyBad: Something bad happened (who tried dividing by zero?!)
  cause: ErrorEFS: Cannot write to directory (attempt to write to ./???/)

ErrorPolykeyBad: Something bad happened (who tried dividing by zero?!)
  causes: 
    ErrorEFS: Cannot write to directory (attempt to write to ./???/)
    ErrorEFS: Cannot open directory (attempt to open ./???/)
    ErrorTimeout: Connection to EFS timed out

I'd say the current state of the errors are trying to be extremely useful for developers, which makes them difficult to understand and work with for end users. As such, we need to reduce the error to a single line (for most cases) for the end user. A product should be sufficiently complex for power users while being easy to use by beginners.

@CMCDragonkai @tegefaulkes

@tegefaulkes
Copy link
Author

ErrorPolykeyRemote: Remote error from RPC call
  localHost	::1
  localPort	59558
  remoteHost	::1
  remotePort	43305
  command	vaultsSecretsEnv
  timestamp	Fri Dec 06 2024 17:50:40 GMT+1100 (Australian Eastern Daylight Time)
  cause: ErrorVaultsVaultUndefined: Vault does not exist

This is intended formatting and I think we should stick to it. It is a little verbose though.

This issue is focusing on how the errors are formatted when converted with .toString() specifcally including details like the cause chain so we don't loose most of the useful information when stringifying errors.

How they are formatted when handled by the CLI formatter is out of scope for this.

Copy link
Member

Then I'm not sure what this issue requires.

I say keep the messages as is. And spec out a format symbol for additional key:value information.

Is this all we need to work on for this issue?

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Dec 15, 2024

There was some custom code added earlier to PK that affected the error reporting. I think that code is a temporary hack. A proper solution means a review of 4 concepts:

  • Diagnostic domain in PK
  • This generic js-error library
  • CLI error reporting for UX purposes
  • Understanding the difference between the json format vs the "human" format

Copy link
Member

I would say default human reporting can do a smart switch between single line reporting for regular stdout/stderr pipe and more fancier reporting with indented multiline message for tty. But then provide specific formatting option overrides. Like tty, pipe, json.

Most programs (that I am aware of) don't do such smart switching. They either have two formats (one for interactive, one for non-interactive), or a single one (which is used everywhere). Adding multiple ways to switch context and information "density" for errors could be adding unnecessary complexity to the program.

There are 2 slightly conflicting goals here. One is a useful error message to the end user so they know how to CORRECT their "usage" of the program. Another is a useful error message for developers to understand how to CORRECT the behaviour of the program.

I'm not sure if I mentioned this somewhere before, but we can display a small one- or two-liner error on the user-facing side of PKCLI, and the detailed message can be printed in the logger. While it would blow up the size for the logs a bit, it would be useful separation to differentiate between information useful to end-users and information useful for developers.

I have done something similar for an application I am working on. Any errors are printed with error formatting. This is kept concise to allow for users to correct their usage. Due to the lack of an explicit logging system, I just print the traceback with a debug level. If the user experiences an error which is not triggered by improper usage, they can either check a file for detailed logs, or reproduce the issue with logger set to debug level to show the traceback. As we have a proper logger, we can print the detailed error via the logger while a concise error is shown to the users.

$ chroma load tokyodark
 EROR  LuaError - /home/aryanj/Projects/chroma/chroma/builtins/colors.lua:7: attempt to call a nil value (upvalue 'fn')
 EROR  Report this at https://github.com/aryanjassal/chroma/issues

$ chroma load tokyodark -v
 EROR  LuaError - /home/aryanj/Projects/chroma/chroma/builtins/colors.lua:7: attempt to call a nil value (upvalue 'fn')
 EROR  Report this at https://github.com/aryanjassal/chroma/issues
 DBUG  Traceback (most recent call last):
 DBUG    File "/home/aryanj/Projects/chroma/pip_packages/bin/chroma", line 8, in <module>
 DBUG      sys.exit(main())
 DBUG               ^^^^^^
 DBUG    File "/home/aryanj/Projects/chroma/chroma/main.py", line 141, in main
 DBUG      theme.load(
 DBUG    File "/home/aryanj/Projects/chroma/chroma/theme.py", line 98, in load
 DBUG      c = config["generators"]["generator_modes"]["background"](ColorHex('#ffffff'))
 DBUG          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 DBUG    File "lupa/lua54.pyx", line 946, in lupa.lua54._LuaObject.__call__
 DBUG    File "lupa/lua54.pyx", line 1918, in lupa.lua54.call_lua
 DBUG    File "lupa/lua54.pyx", line 1945, in lupa.lua54.execute_lua_call
 DBUG    File "lupa/lua54.pyx", line 1826, in lupa.lua54.raise_lua_error

@CMCDragonkai
Copy link
Member

Most programs (that I am aware of) don't do such smart switching. They either have two formats (one for interactive, one for non-interactive), or a single one (which is used everywhere). Adding multiple ways to switch context and information "density" for errors could be adding unnecessary complexity to the program.

The smart switching is done by checking the stderr pipe type. I've seen this functionality.

@CMCDragonkai
Copy link
Member

I have done something similar for an application I am working on. Any errors are printed with error formatting. This is kept concise to allow for users to correct their usage. Due to the lack of an explicit logging system, I just print the traceback with a debug level. If the user experiences an error which is not triggered by improper usage, they can either check a file for detailed logs, or reproduce the issue with logger set to debug level to show the traceback. As we have a proper logger, we can print the detailed error via the logger while a concise error is shown to the users.

Distributed logging and tracing collectors need to collect in a single line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development
Development

No branches or pull requests

3 participants