Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: core: 1st party sql parser using winnow combinators #371

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

jussisaurio
Copy link
Collaborator

@jussisaurio jussisaurio commented Oct 16, 2024

experimenting with building a 1st party sql parser. not integrated to the main limbo codebase yet, lives independently inside core/parser. this is not generated from any grammar - it uses the winnow parser combinator library for the tokenizer and handcrafted imperative parsing for the parser.

the main advantages of this are:

  • Modeling the AST is 100% in our control - we are free to attach any metadata into any AST node. there's already significant pains from not being able to do this with the sqlite3_parser provided AST. For example, not being able to annotate that "this Expr is a column from a known table in our catalog, and it is column number 3", or saying "this expression is a projection made on the result column number 4 of the Sort operator with id 'sorter_foo'"
  • Not reliant on 3rd party AST for fixes
  • We can probably make it more performant than sqlite3_parser. Latest bench stats from CI:
parser/Parse SQL, sqlite3_parser: 'SELECT_STAR_LIMIT_1'
                        time:   [2.9284 µs 2.9308 µs 2.9334 µs]
                        thrpt:  [340.90 Kelem/s 341.20 Kelem/s 341.49 Kelem/s]
parser/Parse SQL, limbo: 'SELECT_STAR_LIMIT_1'
                        time:   [235.47 ns 236.58 ns 238.17 ns]
                        thrpt:  [4.1987 Melem/s 4.2268 Melem/s 4.2469 Melem/s]

parser/Parse SQL, sqlite3_parser: 'MULTIPLE_JOINS'
                        time:   [6.4634 µs 6.4705 µs 6.4798 µs]
                        thrpt:  [154.33 Kelem/s 154.55 Kelem/s 154.72 Kelem/s]
parser/Parse SQL, limbo: 'MULTIPLE_JOINS'
                        time:   [982.45 ns 984.13 ns 985.67 ns]
                        thrpt:  [1.0145 Melem/s 1.0161 Melem/s 1.0179 Melem/s]

parser/Parse SQL, sqlite3_parser: 'MULTIPLE_JOINS_WITH_WHERE'
                        time:   [9.4945 µs 9.4976 µs 9.5006 µs]
                        thrpt:  [105.26 Kelem/s 105.29 Kelem/s 105.32 Kelem/s]
parser/Parse SQL, limbo: 'MULTIPLE_JOINS_WITH_WHERE'
                        time:   [1.7499 µs 1.7569 µs 1.7[704]

parser/Parse SQL, sqlite3_parser: 'MULTIPLE_JOINS_WITH_WHERE_GROUPBY_AND_ORDERBY'
                        time:   [11.877 µs 11.887 µs 11.897 µs]
                        thrpt:  [84.056 Kelem/s 84.127 Kelem/s 84.197 Kelem/s]
parser/Parse SQL, limbo: 'MULTIPLE_JOINS_WITH_WHERE_GROUPBY_AND_ORDERBY'
                        time:   [2.1890 µs 2.1967 µs 2.2087 µs]
                        thrpt:  [452.75 Kelem/s 455.23 Kelem/s 456.83 Kelem/s]

Please note that the results are probably at least in part explained by our inhouse parser not supporting the full SQL syntax yet.

the main disadvantages are:

  • having to maintain all of this ourselves (maybe not a disadvantage)
  • requires a lot of fuzzing / property based testing to get 100% right

Only SELECT related syntax supported right now.

SELECT Statement

  • Basic SELECT ... FROM ... structure
  • Column selection
    • Star (*) selection
    • Individual column selection
    • Qualified column names (e.g., table.column)
  • Table selection in FROM clause
  • Table aliasing
  • WHERE clause with conditions
  • GROUP BY clause
  • ORDER BY clause
    • ASC and DESC directives
    • Multiple column ordering
  • LIMIT clause
  • OFFSET clause
  • HAVING clause
  • Subqueries

Operators and Expressions

  • Basic arithmetic operators (+, -, *, /)
  • Comparison operators (=, !=, <>, >, <, >=, <=)
  • Logical operators (AND, OR, NOT)
  • IN operator
  • NOT IN operator
  • LIKE operator
  • BETWEEN operator
  • Parenthesized expressions
  • Function calls
  • CASE expressions (both simple and searched)

JOINs

  • INNER JOIN
  • LEFT OUTER JOIN
  • Multiple joins in a single query
  • Join conditions (ON clause)
  • RIGHT OUTER JOIN
  • FULL OUTER JOIN
  • CROSS JOIN

Data Types and Literals

  • String literals
  • Numeric literals (integers and floats)
  • Date and time literals
  • Boolean literals
  • NULL

Functions

  • Basic function calls with arguments
  • Aggregate functions (SUM, AVG, etc.)
  • Window functions

Additional Features

  • Case insensitivity for keywords
  • Column aliasing
  • Common Table Expressions (CTEs)
  • Set operations (UNION, INTERSECT, EXCEPT)

Other SQL Statement Types

  • INSERT
  • UPDATE
  • DELETE
  • CREATE TABLE
  • ALTER TABLE
  • DROP TABLE
  • CREATE INDEX
  • CREATE VIEW

@jussisaurio jussisaurio changed the title wip sql parser using winnow combinators core: 1st party sql parser using winnow combinators Oct 21, 2024
@jussisaurio jussisaurio marked this pull request as ready for review October 21, 2024 15:15
@jussisaurio jussisaurio changed the title core: 1st party sql parser using winnow combinators RFC: core: 1st party sql parser using winnow combinators Oct 21, 2024
@penberg
Copy link
Owner

penberg commented Oct 21, 2024

@jussisaurio Rolling out a parser by hand is going to be very hard to make compatible with SQLite. If you want more control over AST generation, a better path is likely porting https://github.com/gwenn/lemon-rs over to our source tree.

We can merge this parser too if you want to pursue it, but it needs to be under a feature flag and disabled by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants