Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report multiple errors, not just the first one #13676

Open
eliaperantoni opened this issue Dec 6, 2024 · 0 comments
Open

Report multiple errors, not just the first one #13676

eliaperantoni opened this issue Dec 6, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@eliaperantoni
Copy link

Is your feature request related to a problem or challenge?

In the following query there are 4 distinct errors:

WITH users AS (
	SELECT 1 AS id, 'John' AS name
)
SELECT 'id:' + idd, name FROM userss GROUP BY id;
  1. userss doesn't exist
  2. idd doesn't exist
  3. Can't add a string to a number
  4. name is missing from GROUP BY

DataFusion currently reports only one of those error when you try to execute the query. After you solve one, you can try again and get the next error.

This can be a bit frustrating for the end user because it requires many iterations of a (possibly expensive and slow) parsing and planning step. Furthermore, reporting multiple errors would make it possible to develop an LSP on top of DataFusion and such.

The desired feature is for DataFusion to report as many errors as possible in one go.

Describe the solution you'd like

The world of programmatic language does this quite well, I think. Take rustc for example: you can get tens of errors in one go and fix them all before invoking an expensive compilation again.

I think we should take inspiration from the way these compilers do it, e.g. panic mode and synchronization. See here for an introduction https://craftinginterpreters.com/parsing-expressions.html#panic-mode-error-recovery.

The way it could work is: when parsing or planning for the SelectItems in a Select, we catch any error coming from one of the SelectItem, store it in a local variable, and proceed with the next. Then if there were any errors, we return their collection. We could add a DataFusionError::Many(Vec<DataFusionError>) to represent this.

The same idea of "storing the error for later, synchronising to the next safe point, and continuing" could also be applied when parsing or planning for different parts of a query (e.g. the CTEs, the SELECT, the WHERE, the ORDER BY, etc. After any error in the CTEs section, we can continue with the SELECT and collect the errors there, then move on to the WHERE, etc), and also when analysing different Statements.

Describe alternatives you've considered

No response

Additional context

This is related to issue #13662 and my PR about diagnostics #13664. I'd be open to work on this issue too if the contributions would be welcomed.

@eliaperantoni eliaperantoni added the enhancement New feature or request label Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant