Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflict target ambiguity and other questions #94

Open
johnedquinn opened this issue Dec 3, 2024 · 1 comment
Open

Conflict target ambiguity and other questions #94

johnedquinn opened this issue Dec 3, 2024 · 1 comment
Labels
question Further information is requested

Comments

@johnedquinn
Copy link
Member

johnedquinn commented Dec 3, 2024

From RFC-0011, the conflict target can either contain a constraint name, index targets, primary keys, or composite primary keys.

<conflict target> ::=
    ( <index target> [, <index target>]... )
    | ( { <primary key> | <composite primary key> } )
    | ON CONSTRAINT <constraint name>
    ;

<index target> ::= <index attr name>;

<primary key> ::= <attr name>;

<composite primary key> ::= <attr name>, <attr name> [, <attr name> ]...;

From my reading of the rules, there is an ambiguity between the variants of index target, primary key, and composite primary key. Essentially, when the parser sees a left parenthesis, it always expects an attr name, which is currently an identifier (delimited or not).

Can someone provide some clarity on the following questions? Looping in @am357.

  1. On the parse, upon encountering a left parenthesis and 1+ identifiers, how do we know which EBNF rule is followed? Should it be parsed into a generic AST structure that holds 1+ identifiers? From the RFC, it seems like we semantically analyze it but I'd like to be sure. See the following question.
  2. As a follow-up to Q1 above, can I semantically place identifiers corresponding to an index and some to a primary key in the same list? Is that allowed? Since they share the same syntax, this is largely a semantic question. Since I'm currently writing Adds DML (INSERT, DELETE, UPDATE, UPSERT, REPLACE) partiql-lang-kotlin#1666, I'd like some clarity on the modeling of the AST nodes for this. To disambiguate this, I'm honestly confused about the use of the primary key rule at all. Especially since SQL tables can only have a single primary key (whether it's multiple columns or not), and primary keys are constraints. So one could (and probably should in my opinion) use ON CONSTRAINT PRIMARY KEY.
  3. From the EBNF in the RFC, attr name is an identifier (delimited or regular). Is there any future with allowing for paths and nested paths? This is just a theoretical question, but just curious.
@am357
Copy link
Contributor

am357 commented Dec 13, 2024

Had an offline discussion with @johnedquinn about this. Adding the summary:

On the parse, upon encountering a left parenthesis and 1+ identifiers, how do we know which EBNF rule is followed? Should it be parsed into a generic AST structure that holds 1+ identifiers? From the RFC, it seems like we semantically analyze it but I'd like to be sure. See the following question.

As a follow-up to Q1 above, can I semantically place identifiers corresponding to an index and some to a primary key in the same list? Is that allowed? Since they share the same syntax, this is largely a semantic question. Since I'm currently writing partiql/partiql-lang-kotlin#1666, I'd like some clarity on the modeling of the AST nodes for this. To disambiguate this, I'm honestly confused about the use of the primary key rule at all. Especially since SQL tables can only have a single primary key (whether it's multiple columns or not), and primary keys are constraints. So one could (and probably should in my opinion) use ON CONSTRAINT PRIMARY KEY.

The short answer is yes. Looking back we can collapse the two and use one AST structure. The RFC is written to both support SQL (e.g. PostgresSQL) and NoSQL (e.g., DynamoDB) databases, hence having a motivation for making a distinction between PRIMARY_KEY and other constraints.

SQL tables can only have a single primary key (whether it's multiple columns or not)

True, but NoSQL tables can have composite keys (such as DynamoDB). In SQL tables, for example in PostgresSQL, one case still define a constraint (e.g., UNIQUE) on multiple columns and check for it using ON CONFLICT (col1, col2)

From the EBNF in the RFC, attr name is an identifier (delimited or regular). Is there any future with allowing for paths and nested paths? This is just a theoretical question, but just curious.

I remember this was discussed during our RFC work, but since it opens a whole new set of ambiguities with regards managing partial schemas we decided at the time to keep the scope limited to the identifiers. Retrospectively, I see this is missing from mentioning in the RFC's Future possibilities section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants