Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test code for grammar specification #2

Closed
wants to merge 10 commits into from
Closed

Conversation

sigmaith
Copy link
Contributor

@sigmaith sigmaith commented Oct 7, 2024

What this PR does / why we need it:

Here's what we need to do for schema validation.

Device A. Devices that validate user-defined schemas
Device B. Devices that validate user-altered values with pre-defined schema

Regarding device A, we would like to support the following specifications.

Element level schema definition:

  • A type definition syntax similar to TypeScript is used.
  • Yorkie’s primitive types can be used.
  • Yorkie's element-level default types can be utilized.
    (yorkie.Array, yorkie.Object, yorkie.Counter, yorkie.Text, yorkie.Tree)
  • Additionally, user-defined types are allowed.
  • Undefined types are not permitted.

Tree level schema definition:

  • Further discussion is required.

In detail, I have written test code to validate the following grammar specifications.

CheckList

TypeScript-Like Grammar

  • Add TypeScript and other type tests to ensure errors are correctly thrown.
  • ; (semicolon supported)
  • Array<Todo>, Array<Array<Todo>> support
    (as -is: onlyTodo[]supported)
  • ? optional properties
  • | union operator
  • Literal type
  • annotation (#, // skip)
  • & Intersection Types
  • Utility Types

Yorkie Primitive Types

  • string, number, boolean, null, bigint, Uint8Array, Date
  • An error occurs when unsupported types (long, bytes) are set.

Yorkie Element & Generic Parameter

  • yorkie.Object, yorkie.Array, yorkie.Counter, yorkie.Text, yorkie.Tree
  • yorkie.Array -> fail, yorkie.Array<Todo> -> pass
  • yorkie.Object-> fail,yorkie.Object<Todo> -> pass
  • yorkie.Counter -> pass, yorkie.Counter<Todo> -> fail
  • yorkie.Text<Attribute> -> pass (Attribute: pre-defined type)
  • yorkie.Text<{bold: boolean; italic: boolean;}> -> pass

User-Defined Type

  • user-defined type
    e.g. type Todo { title: string; completed: boolean; }
  • undefined type error

Error Handling

  • Use reserved word
  • Type cycle

Follow-up Implementations Needed

  • Refactor grammar to pass all tests
  • Customize ANTLR's baseErrorListener to define and throw parseError(syntaxError)
  • Implement an AST visitor or listener to validate semanticError(e.g. use undefined type, type cycle)
  • Develop an ErrorReporter to comprehensively report all errors

Summary by CodeRabbit

  • New Features

    • Introduced a comprehensive overhaul of the schema grammar, enhancing parsing capabilities with new types and structures.
    • Added support for Yorkie primitive types and user-defined types in the schema validation.
    • Expanded the test suite for schema validation, covering various scenarios and edge cases.
  • Bug Fixes

    • Improved error handling in the validation process to provide more descriptive error messages.
  • Documentation

    • Enhanced clarity in the test structure and coverage for schema validation scenarios.

@sigmaith sigmaith changed the title Schema validation Add test code for grammar specification Oct 7, 2024
@sigmaith sigmaith marked this pull request as draft October 7, 2024 02:42
@hackerwins hackerwins marked this pull request as ready for review October 7, 2024 03:15
@hackerwins hackerwins marked this pull request as draft October 7, 2024 04:52
@hackerwins hackerwins marked this pull request as ready for review October 7, 2024 04:52
Copy link

coderabbitai bot commented Oct 7, 2024

Walkthrough

The changes involve a significant overhaul of the grammar in the Schema.g4 file, replacing the entry point and restructuring various rules to enhance the parsing capabilities. New lexer rules and constructs have been introduced, while several existing rules have been removed or modified. Corresponding updates in the src/validator.ts file include the addition of new parsing contexts and visitor methods, along with improved error handling. The test suite in test/schema.test.ts has been restructured to cover a broader range of scenarios and improve clarity in schema validation.

Changes

File Change Summary
antlr/Schema.g4 - Replaced entry point start with document.
- Removed typeDefinitions and typeDefinition rules.
- Introduced new lexer rules: NEWLINE, COMMENT, DIGIT, and various YORKIE_* keywords.
- Restructured rules for definitions and types, adding definitionList, fieldDefList, and expanded type rules.
antlr/Schema.interp - Added new token literal names and symbolic names, including null, bigint, Uint8Array, YORKIE_* types.
- Updated rule names to reflect new grammar structure.
antlr/Schema.tokens - Updated token definitions, added new tokens like NEWLINE, COMMENT, DIGIT, and YORKIE_* types, and reordered existing tokens.
antlr/SchemaLexer.interp - Expanded token literal names and symbolic names, including YORKIE_* types and various symbols.
- Updated rule names to accommodate new tokens.
antlr/SchemaLexer.tokens - Modified token definitions by adding new tokens and adjusting their numeric values.
antlr/SchemaLexer.ts - Renumbered existing token constants and added new ones, including NEWLINE, COMMENT, DIGIT, and YORKIE_* types.
- Updated rule names and internal mappings.
antlr/SchemaListener.ts - Replaced several existing methods with new ones to align with the updated grammar structure, such as enterDocument and exitDocument.
antlr/SchemaParser.ts - Renamed static rule identifiers and restructured parsing methods to reflect the new grammar, including the addition of new context classes.
antlr/SchemaVisitor.ts - Removed outdated visit methods and added new ones corresponding to the updated grammar, such as visitDocument and visitDefinitionList.
src/validator.ts - Added new parsing contexts and visitor methods, updated error handling in validate function, and improved error messages in visitErrorNode.
test/schema.test.ts - Restructured test cases into categories, expanded coverage for various schema scenarios, and replaced a single test case with multiple focused tests.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Validator
    participant Parser
    participant Schema

    User->>Validator: Validate Schema
    Validator->>Parser: Parse Document
    Parser->>Schema: Process Definitions
    Schema-->>Parser: Return Parsed Structure
    Parser-->>Validator: Return Context
    Validator-->>User: Validation Result
Loading

🐰 In the land of code where rabbits play,
New grammar rules hop in, bright as day.
With fields and types all set to explore,
Our schema's now richer, oh what a score!
Validations are swift, errors now clear,
Let's dance through the tests, with nothing to fear!
🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@sigmaith sigmaith marked this pull request as draft October 7, 2024 04:53
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Outside diff range and nitpick comments (4)
antlr/Schema.g4 (2)

18-31: Remove unused lexer tokens to simplify the grammar

The lexer rules from lines 18 to 31 define several tokens:

MINUS: '-';
SEMICOLON: ';';
LPAREN: '(';
RPAREN: ')';
LCURLY: '{';
RCURLY: '}';
GT: '>';
LT: '<';
PIPE: '|';
QUESTION: '?';
EQ: '=';
COMMA: ',';
LSQUARE: '[';
RSQUARE: ']';

However, tokens like EQ: '='; (line 28), LSQUARE, and RSQUARE are not used in any parser rules. Keeping unused tokens can lead to confusion and make the grammar harder to maintain.

Consider removing these unused tokens to streamline the grammar:

-EQ: '=';
-LSQUARE: '[';
-RSQUARE: ']';

75-82: Ensure consistent capitalization for primitive types

In the primitiveType rule, there is a mix of lowercase and capitalized type names:

primitiveType:
    'string'
    | 'number'
    | 'boolean'
    | 'null'
    | 'bigint'
    | 'Uint8Array'
    | 'Date';

Most types are lowercase, while 'Uint8Array' and 'Date' are capitalized.

For consistency, consider standardizing the capitalization. If these types are meant to be case-sensitive and capitalized as in JavaScript, ensure that this is intentional and documented.

Alternatively, adjust the types to match the casing:

    | 'string'
    | 'number'
    | 'boolean'
    | 'null'
    | 'bigint'
-   | 'Uint8Array'
-   | 'Date';
+   | 'uint8array'
+   | 'date';
src/validator.ts (1)

80-80: Consider enhancing error handling in visitErrorNode.

Currently, visitErrorNode throws an error when a syntax error is encountered, which will halt the parsing process at the first error detected. Depending on your requirements, you might want to collect all syntax errors to provide comprehensive feedback or implement a custom error reporting mechanism that offers more context.

test/schema.test.ts (1)

134-143: Address TODO: Enable users to define properties of yorkie.Text

The test case is currently skipped and includes a TODO about allowing users to define the properties of the yorkie.Text data structure. If assistance is needed to implement this feature, I can help.

Would you like me to assist in implementing this functionality or open a new GitHub issue to track this task?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 60fd78d and f652a32.

📒 Files selected for processing (3)
  • antlr/Schema.g4 (1 hunks)
  • src/validator.ts (3 hunks)
  • test/schema.test.ts (1 hunks)
🧰 Additional context used
🔇 Additional comments (2)
antlr/Schema.g4 (2)

5-8: Review overlapping rules for whitespace and newlines

In lines 5 to 8, the lexer rules are defined as:

NEWLINE: '\r'? '\n' -> skip;
COMMENT: ('//' | '#') ~[\r\n]* -> skip;
DIGIT: [0-9];
WHITESPACE: [ \t\n\r]+ -> skip;

Notice that both NEWLINE and WHITESPACE include newline characters, and both are being skipped. This overlap may cause redundancy or unexpected behavior in how newlines are handled.

To ensure that newlines are processed correctly without redundancy, consider reviewing these rules and possibly adjusting them so that NEWLINE handles only line endings and WHITESPACE handles spaces and tabs exclusively.


65-71: Review potential left recursion or ambiguity in nonUnionTypeL2

The nonUnionTypeL2 rule allows for nested types and groupings:

nonUnionTypeL2:
    LPAREN type RPAREN
    | objectLiteralType
    | primitiveType
    | literalType
    | yorkieType
    | typeReference;

Ensure that this rule doesn't introduce left recursion or ambiguities that could lead to parsing issues.

Consider testing complex nested type expressions to confirm that the grammar correctly parses them without errors.

antlr/Schema.g4 Outdated Show resolved Hide resolved
antlr/Schema.g4 Show resolved Hide resolved
antlr/Schema.g4 Outdated Show resolved Hide resolved
src/validator.ts Outdated Show resolved Hide resolved
test/schema.test.ts Outdated Show resolved Hide resolved
test/schema.test.ts Outdated Show resolved Hide resolved
test/schema.test.ts Outdated Show resolved Hide resolved
test/schema.test.ts Outdated Show resolved Hide resolved
@sigmaith sigmaith self-assigned this Oct 7, 2024
antlr/Schema.g4 Outdated Show resolved Hide resolved
@hackerwins hackerwins marked this pull request as ready for review October 7, 2024 06:24
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Outside diff range and nitpick comments (4)
antlr/Schema.tokens (1)

Line range hint 1-69: Summary of changes to Schema.tokens

The modifications to Schema.tokens represent a significant update to the grammar's token definitions. These changes align well with the PR objectives of enhancing schema validation capabilities for the Yorkie framework. The additions and reorganizations include:

  1. New tokens for Yorkie-specific types (yorkie.Object, yorkie.Array, etc.).
  2. Additional primitive types (null, bigint, Uint8Array, Date).
  3. Reorganization of existing token values.
  4. New tokens for improved parsing capabilities (NEWLINE, COMMENT, DIGIT, etc.).

These changes should provide a more robust foundation for schema validation, supporting both user-defined schemas and validation against pre-defined schemas as outlined in the PR objectives.

As these changes impact the core grammar of the schema validation system, ensure that:

  1. The corresponding grammar file (Schema.g4) is updated to utilize these new tokens effectively.
  2. The parser and validator implementations in TypeScript are updated to handle the new token types and values.
  3. The test suite is expanded to cover the new capabilities introduced by these token changes, particularly focusing on the new Yorkie-specific types and additional primitive types.
src/validator.ts (1)

87-104: LGTM: Improved error handling in the validate function.

The changes to the validate function significantly improve error handling and provide more robust validation. The use of a try-catch block to handle potential errors during parsing and visiting is a good practice.

One minor suggestion for improvement:

Consider adding a more specific error type for schema validation errors. This could help distinguish between syntax errors and semantic errors in the schema. For example:

class SchemaValidationError extends Error {
  constructor(message: string) {
    super(message);
    this.name = 'SchemaValidationError';
  }
}

// Then in the catch block:
if (error instanceof SchemaValidationError) {
  console.error(`Schema validation error: ${error.message}`);
} else if (error instanceof Error) {
  console.error(`Unexpected error during validation: ${error.message}`);
} else {
  console.error(`Unknown error: ${String(error)}`);
}

This would provide more granular error handling and logging.

antlr/Schema.interp (1)

Line range hint 111-221: Consider excluding generated files from version control

The atn section (lines 111~ to 221~) appears to contain auto-generated code. Including generated files like .interp files in version control can lead to unnecessary repository bloat and potential merge conflicts. It's a common best practice to exclude such files from the repository.

You might want to add antlr/Schema.interp to your .gitignore file to prevent it from being committed. This way, the file can be generated locally as needed without being tracked in version control.

antlr/SchemaLexer.ts (1)

Line range hint 1-6: Consider excluding generated files from version control.

This file is generated from Schema.g4 by ANTLR, as indicated by the comment at the top. It's generally a best practice to exclude generated files from version control systems to reduce repository size and avoid potential merge conflicts. Instead, regenerate the files as part of the build process.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f652a32 and a2b9afc.

📒 Files selected for processing (11)
  • antlr/Schema.g4 (1 hunks)
  • antlr/Schema.interp (2 hunks)
  • antlr/Schema.tokens (1 hunks)
  • antlr/SchemaLexer.interp (4 hunks)
  • antlr/SchemaLexer.tokens (1 hunks)
  • antlr/SchemaLexer.ts (3 hunks)
  • antlr/SchemaListener.ts (3 hunks)
  • antlr/SchemaParser.ts (14 hunks)
  • antlr/SchemaVisitor.ts (2 hunks)
  • src/validator.ts (2 hunks)
  • test/schema.test.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/schema.test.ts
🧰 Additional context used
🪛 Biome
antlr/SchemaParser.ts

[error] 1339-1341: This constructor is unnecessary.

Unsafe fix: Remove the unnecessary constructor.

(lint/complexity/noUselessConstructor)


[error] 1386-1388: This constructor is unnecessary.

Unsafe fix: Remove the unnecessary constructor.

(lint/complexity/noUselessConstructor)


[error] 1418-1420: This constructor is unnecessary.

Unsafe fix: Remove the unnecessary constructor.

(lint/complexity/noUselessConstructor)


[error] 1448-1450: This constructor is unnecessary.

Unsafe fix: Remove the unnecessary constructor.

(lint/complexity/noUselessConstructor)


[error] 1495-1497: This constructor is unnecessary.

Unsafe fix: Remove the unnecessary constructor.

(lint/complexity/noUselessConstructor)


[error] 1560-1562: This constructor is unnecessary.

Unsafe fix: Remove the unnecessary constructor.

(lint/complexity/noUselessConstructor)


[error] 1590-1592: This constructor is unnecessary.

Unsafe fix: Remove the unnecessary constructor.

(lint/complexity/noUselessConstructor)

🔇 Additional comments (58)
antlr/Schema.tokens (3)

14-38: New tokens added to enhance grammar capabilities

The addition of new tokens (NEWLINE, COMMENT, DIGIT, YORKIE_* tokens, and various punctuation tokens) suggests an expansion of the grammar's capabilities. This is a positive change that will likely improve the expressiveness of the schema definition language.

To ensure these new tokens are properly utilized, run the following script:

#!/bin/bash
# Description: Verify usage of new tokens in the grammar file

# Check for usage of new tokens in Schema.g4
echo "Checking Schema.g4 for usage of new tokens:"
rg --type antlr "NEWLINE|COMMENT|DIGIT|YORKIE_OBJECT|YORKIE_ARRAY|YORKIE_COUNTER|YORKIE_TEXT|YORKIE_TREE|MINUS|SEMICOLON|LPAREN|RPAREN|LCURLY|RCURLY|GT|LT|PIPE|QUESTION|EQ|COMMA|LSQUARE|RSQUARE|DOUBLE_QUOTED_STRING|SINGLE_QUOTED_STRING" antlr/Schema.g4

41-69: Token value assignments updated and new tokens added

The token value assignments have been significantly updated:

  1. 'string', 'number', and 'boolean' now have values 3, 4, and 5 respectively.
  2. New tokens for 'null', 'bigint', 'Uint8Array', and 'Date' have been added.
  3. Yorkie-specific tokens (yorkie.Object, yorkie.Array, etc.) have been assigned values 18-22.
  4. Various punctuation tokens have been assigned values.

These changes reflect a reorganization of the token values and the introduction of new types, which aligns with the PR objectives of enhancing schema validation capabilities.

To ensure these changes are consistently applied, run the following script:

#!/bin/bash
# Description: Verify token value consistency in grammar and TypeScript files

# Check Schema.g4 file for token usage
echo "Checking Schema.g4 for token usage:"
rg --type antlr "string|number|boolean|null|bigint|Uint8Array|Date|yorkie\.Object|yorkie\.Array|yorkie\.Counter|yorkie\.Text|yorkie\.Tree" antlr/Schema.g4

# Check TypeScript files for token value usage
echo "Checking TypeScript files for token value usage:"
rg --type ts "3|4|5|6|7|8|9|18|19|20|21|22" src

13-13: Verify token value consistency

The IDENTIFIER and WHITESPACE tokens have been shifted. Ensure that these changes are consistent with the corresponding updates in the grammar file (Schema.g4) and any dependent code that uses these token values.

Run the following script to check for consistency:

Also applies to: 17-17

✅ Verification successful

Token definitions are consistent

The IDENTIFIER and WHITESPACE tokens maintain consistent values across Schema.g4, Schema.tokens, and the codebase. No inconsistencies or issues were found regarding these token assignments.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify token value consistency across files

# Check Schema.g4 file for token definitions
echo "Checking Schema.g4 for token definitions:"
rg --type antlr "IDENTIFIER|WHITESPACE" antlr/Schema.g4

# Check for usage of token values in TypeScript files
echo "Checking TypeScript files for token value usage:"
rg --type ts "13|17" src

Length of output: 784


Script:

#!/bin/bash
# Description: Extract token definitions and search for their usage

# Display the contents of antlr/Schema.tokens
echo "Contents of antlr/Schema.tokens:"
cat antlr/Schema.tokens

# Extract token definitions from antlr/Schema.g4 using ast-grep
echo "Extracting token definitions from Schema.g4:"
ast-grep --lang antlr --pattern $'token IDENTIFIER = $_;' antlr/Schema.g4
ast-grep --lang antlr --pattern $'token WHITESPACE = $_;' antlr/Schema.g4

# Search the entire codebase for usage of token values 13 and 17
echo "Searching for usage of token values 13 and 17 in the codebase:"
rg '\b(13|17)\b'

Length of output: 22498


Script:

#!/bin/bash
# Description: Extract and verify token definitions from Schema.g4

# Display the contents of antlr/Schema.g4
echo "Contents of antlr/Schema.g4:"
cat antlr/Schema.g4

# Search for IDENTIFIER and WHITESPACE token definitions in Schema.g4
echo "Searching for IDENTIFIER and WHITESPACE token definitions in Schema.g4:"
rg 'token\s+(IDENTIFIER|WHITESPACE)\s*=' antlr/Schema.g4

# Optionally, search the entire codebase for usage of token names
echo "Searching the codebase for usage of token names IDENTIFIER and WHITESPACE:"
rg '\bIDENTIFIER\b|\bWHITESPACE\b'

Length of output: 6332

antlr/SchemaLexer.tokens (4)

14-38: New token additions align well with PR objectives

The new tokens added (lines 14-38) are appropriate for a schema definition language and align well with the PR objectives. Specifically:

  1. The Yorkie-specific tokens (YORKIE_OBJECT, YORKIE_ARRAY, YORKIE_COUNTER, YORKIE_TEXT, YORKIE_TREE) support the requirement to utilize Yorkie's primitive types.
  2. The addition of tokens for common programming language symbols (MINUS, SEMICOLON, LPAREN, RPAREN, etc.) supports the TypeScript-like syntax requirement.
  3. Tokens like NEWLINE, COMMENT, and DIGIT enhance the lexer's capabilities for parsing the schema language.

The token numbering is consistent and sequential, which is good for maintainability.


12-13: Reordering of existing tokens is appropriate

The reordering of existing tokens (T__11, IDENTIFIER, and WHITESPACE) is appropriate and maintains a logical order while accommodating the new tokens. This change doesn't introduce any apparent issues or inconsistencies.

Also applies to: 17-17


40-69: New string literal assignments support PR objectives

The new string literal assignments (lines 40-69) are consistent with the token definitions and support the PR objectives:

  1. The assignments for Yorkie-specific types (lines 51-55: 'yorkie.Object', 'yorkie.Array', etc.) directly support the requirement to utilize Yorkie's primitive types.
  2. The addition of string literals for common programming symbols and keywords (e.g., 'null', 'bigint', 'true', 'false') supports the TypeScript-like syntax requirement.
  3. The assignments are correctly mapped to their corresponding token definitions, ensuring consistency in the lexer.

These additions enhance the lexer's capability to recognize and tokenize the schema language elements as specified in the PR objectives.


Line range hint 1-69: Overall changes align well with PR objectives

The modifications to the SchemaLexer.tokens file provide a solid foundation for the schema validation capabilities outlined in the PR objectives. The changes support both the TypeScript-like syntax requirement and the use of Yorkie primitive types.

Key points:

  1. New tokens and string literals for Yorkie-specific types have been added.
  2. Additional tokens and literals for common programming language elements support the TypeScript-like syntax.
  3. The overall structure and ordering of tokens remain consistent and logical.

These changes set the stage for implementing the grammar specifications necessary for schema validation in the Yorkie framework.

antlr/Schema.g4 (9)

3-9: LGTM: Lexer rules are well-defined

The lexer rules for IDENTIFIER, NEWLINE, COMMENT, DIGIT, and WHITESPACE are appropriately defined. Skipping NEWLINE and COMMENT is a standard practice that helps in parsing the meaningful parts of the input.


10-15: LGTM: Yorkie type keywords are well-defined

The keywords for Yorkie types (YORKIE_OBJECT, YORKIE_ARRAY, YORKIE_COUNTER, YORKIE_TEXT, YORKIE_TREE) are correctly defined and align with the PR objectives. The naming convention is consistent and clear.


17-31: LGTM: Operations and symbols are comprehensive

The defined operations and symbols cover a wide range of necessary elements for the grammar, including parentheses, curly braces, comparison operators, and others. The naming is clear and follows ANTLR conventions.


33-36: LGTM: String handling rules are well-defined

The rules for handling double-quoted and single-quoted strings, along with the ESC fragment for escape sequences, are comprehensively defined. This allows for flexible string representation in the grammar.


38-41: LGTM: Top-level structure is well-defined

The top-level rule 'document' and the 'definitionList' are well-structured. This allows for a flexible document format with multiple definitions, and the use of EOF ensures the entire input is parsed.


43-50: LGTM: Object type definition structure is well-defined

The 'definition' and 'objectTypeDefinition' rules are well-structured and align with the TypeScript-like syntax mentioned in the PR objectives. The flexibility in whitespace and optional trailing semicolon allows for various formatting styles.


60-70: LGTM: Type definition rules are comprehensive

The type definition rules, including 'type', 'nonUnionType', and 'nonUnionTypeL2', provide a comprehensive structure for defining various types. This includes support for union types, arrays, object literals, primitives, literals, Yorkie types, and type references, aligning well with the PR objectives.


110-110: Clarify the intention for yorkieTreeType

The yorkieTreeType is defined with empty angle brackets:

yorkieTreeType: YORKIE_TREE LT GT;

While this might be intentional, it's different from the other Yorkie types that allow for type parameters.

Could you clarify if this is the intended definition for yorkieTreeType? If it should accept type parameters like the other Yorkie types, consider updating it to:

yorkieTreeType: YORKIE_TREE LT (typeReference | objectLiteralType) GT;

1-110: LGTM: Comprehensive grammar definition for Yorkie schema

The Schema.g4 file provides a comprehensive grammar definition that aligns well with the PR objectives. It covers all aspects of schema definition, including:

  1. TypeScript-like syntax for type definitions
  2. Support for Yorkie primitive types (Object, Array, Counter, Text, Tree)
  3. Flexible type definitions, including union types and literals
  4. Support for user-defined types

The grammar provides a solid foundation for schema validation in the Yorkie framework. With the suggested minor improvements and clarifications, this grammar will effectively support the schema validation requirements outlined in the PR objectives.

src/validator.ts (3)

6-31: LGTM: Import statements and context types are well-organized.

The new import statements and context types are correctly added and logically grouped. These additions align with the changes in the Visitor class, providing the necessary types for the updated parser.


32-34: LGTM: Additional imports for node types are appropriate.

The new imports for ErrorNode, RuleNode, and TerminalNode are correctly added. These are necessary for the updated Visitor class implementation and align with the changes in the parsing logic.


80-82: LGTM: Improved error handling in visitErrorNode.

The updated visitErrorNode method now throws an error with a more specific message, including the text of the node where the syntax error occurred. This change enhances error reporting and will be beneficial for debugging.

antlr/SchemaVisitor.ts (2)

41-206: New visit methods align with updated grammar definitions

The added visit methods in the SchemaVisitor interface correctly reflect the changes in the grammar. This ensures that the visitor can handle the new parse tree structures generated by the updated SchemaParser.


41-206: ⚠️ Potential issue

Remove line number annotations from JSDoc comments for clarity

The inclusion of line number annotations (e.g., 41~, 42~, etc.) within the JSDoc comments is unnecessary and can reduce code readability. These annotations should be removed to keep the documentation clean.

Apply this diff to remove line number annotations from the comments:

-    /**
-     * Visit a parse tree produced by `SchemaParser.document`.
-     * @param ctx the parse tree
-     * @return the visitor result
-     */
+    /**
+     * Visit a parse tree produced by `SchemaParser.document`.
+     * @param ctx the parse tree
+     * @return the visitor result
+     */
    visitDocument?: (ctx: DocumentContext) => Result;

-    /**
-     * Visit a parse tree produced by `SchemaParser.definitionList`.
-     * @param ctx the parse tree
-     * @return the visitor result
-     */
+    /**
+     * Visit a parse tree produced by `SchemaParser.definitionList`.
+     * @param ctx the parse tree
+     * @return the visitor result
+     */
    visitDefinitionList?: (ctx: DefinitionListContext) => Result;

-    /**
-     * Visit a parse tree produced by `SchemaParser.definition`.
-     * @param ctx the parse tree
-     * @return the visitor result
-     */
+    /**
+     * Visit a parse tree produced by `SchemaParser.definition`.
+     * @param ctx the parse tree
+     * @return the visitor result
+     */
    visitDefinition?: (ctx: DefinitionContext) => Result;

*... Apply similar changes for the remaining methods ...*

Likely invalid or redundant comment.

antlr/Schema.interp (3)

8-38: Verify the 'null' entries in token literal names

The token literal names section contains several 'null' entries (lines 15~ to 19~). Please ensure that these entries are intentional and won't cause issues during parsing, as unexpected null values might lead to runtime errors or incorrect tokenization.


55-81: Check the 'null' entries in token symbolic names

In the token symbolic names section, multiple null entries are present (lines 55~ to 59~). Verify that these null values are expected and that they do not adversely affect the lexer or parser operations.


84-107: Review the consistency of new rule names

The new rules added to the rule names section (lines 84~ to 107~) enhance the grammar's capabilities. Ensure that these rule names are consistently defined across all related files and that they align with the grammar specifications outlined in the PR objectives.

antlr/SchemaListener.ts (24)

6-29: Imports updated to reflect new grammar contexts

The updated import statements correctly include the new contexts from SchemaParser, aligning the SchemaListener with the revised grammar specifications. This ensures all necessary contexts are available for the interface methods.


38-46: Added enterDocument and exitDocument methods

The addition of enterDocument and exitDocument methods corresponds to the new document entry point in the grammar. The method signatures and JSDoc comments are correctly defined.


49-57: Added methods for DefinitionList context

The new methods enterDefinitionList and exitDefinitionList align with the updated grammar rule for definitionList. The additions are appropriate and properly documented.


60-68: Updated Definition methods

The enterDefinition and exitDefinition methods correctly replace the previous enterTypeDefinition and exitTypeDefinition methods, reflecting the changes in the grammar.


71-79: Introduced methods for TypeName context

The new enterTypeName and exitTypeName methods handle the parsing of type names in the updated grammar. The method definitions are accurate.


82-90: Added ObjectTypeDefinition methods

The methods enterObjectTypeDefinition and exitObjectTypeDefinition are correctly added to support object type definitions as per the new grammar rules.


93-101: Included FieldDefList methods

The introduction of enterFieldDefList and exitFieldDefList methods aligns with the updated handling of field definitions in the grammar.


104-112: Added methods for Identifier context

The enterIdentifier and exitIdentifier methods appropriately handle identifiers in the new grammar structure.


115-123: Included FieldDef methods

The addition of enterFieldDef and exitFieldDef methods reflects the updated grammar's approach to defining fields.


126-134: Added methods for Type context

The enterType and exitType methods are properly added to handle type parsing in the revised grammar.


137-145: Introduced NonUnionType methods

The methods for enterNonUnionType and exitNonUnionType are correctly defined. However, consider reviewing the naming convention used for related contexts.


159-167: Ensure implementations of TypeReference methods are updated

The methods enterTypeReference and exitTypeReference are essential for handling type references in the new grammar. Verify that all implementations of SchemaListener correctly implement these methods to prevent any runtime issues.


169-178: Added ObjectLiteralType methods

The methods for enterObjectLiteralType and exitObjectLiteralType are properly included to support object literal types in the grammar.


191-200: Included LiteralType methods

The addition of enterLiteralType and exitLiteralType methods supports parsing of literal types as per the updated grammar rules.


202-211: Added methods for BooleanLiteralType

The enterBooleanLiteralType and exitBooleanLiteralType methods are correctly added to handle boolean literals in the grammar.


213-222: Included NumberLiteralType methods

The methods enterNumberLiteralType and exitNumberLiteralType align with the new grammar rules for parsing number literals.


224-233: Added StringLiteralType methods

The addition of enterStringLiteralType and exitStringLiteralType methods supports string literal parsing in the updated grammar.


235-244: Introduced YorkieType methods

The methods for enterYorkieType and exitYorkieType are appropriately added, reflecting the integration of Yorkie-specific types into the grammar.


246-255: Added methods for YorkieObjectType

The enterYorkieObjectType and exitYorkieObjectType methods are correctly included to handle Yorkie object types.


257-266: Included YorkieArrayType methods

The addition of enterYorkieArrayType and exitYorkieArrayType methods corresponds to the new grammar rules for Yorkie array types.


268-277: Added methods for YorkieCounterType

The enterYorkieCounterType and exitYorkieCounterType methods are properly added to support Yorkie counter types in the grammar.


279-288: Included YorkieTextType methods

The methods enterYorkieTextType and exitYorkieTextType are correctly added to handle Yorkie text types.


290-299: Added methods for YorkieTreeType

The addition of enterYorkieTreeType and exitYorkieTreeType methods aligns with the grammar updates for Yorkie tree types.


Line range hint 38-299: Ensure all SchemaListener implementations are updated

Given the extensive changes to the SchemaListener interface, ensure that all classes implementing this interface have been updated accordingly. Missing implementations of the new methods or lingering old method implementations could lead to runtime errors.

Run the following script to identify any implementations that may need updates:

antlr/SchemaLexer.interp (5)

8-13: Addition of new token literal names aligns with objectives

The inclusion of token literals 'null', 'bigint', 'Uint8Array', 'Date', 'true', and 'false' enhances the lexer's ability to recognize these types, aligning with the PR objectives for schema validation.


20-24: Inclusion of Yorkie primitive types is appropriate

Adding 'yorkie.Object', 'yorkie.Array', 'yorkie.Counter', 'yorkie.Text', and 'yorkie.Tree' to the token literal names supports the validation of Yorkie's primitive types as specified in the PR objectives.


132-132: Validate the updated ATN for correctness

The ATN section starting at line 132~ has been significantly expanded. Please verify that the updated ATN accurately reflects the grammar changes and does not introduce any parsing errors.


14-19: ⚠️ Potential issue

Verify the 'null' entries in token literal names

There are several null entries at lines 15~ to 19~ in the token literal names. Please confirm whether these are intentional placeholders or if they should be replaced with actual token literals to prevent potential parsing issues.


55-59: ⚠️ Potential issue

Ensure all tokens have corresponding symbolic names

At line 55~, there's a null entry in the token symbolic names. Ensure that all new token literals have corresponding symbolic names to prevent any lexer ambiguities or mismatches.

antlr/SchemaLexer.ts (5)

30-56: Addition of new token constants is correct.

The new public static readonly token constants have been added sequentially and correctly numbered. They align with the updates in the grammar and will enable the lexer to recognize the new tokens accurately.


70-74: ruleNames array updated appropriately.

The ruleNames array has been updated to include the new tokens. This ensures that the lexer can correctly map the token indices to their corresponding rule names.


78-83: _LITERAL_NAMES array reflects new literals accurately.

The _LITERAL_NAMES array has been updated with the new token literals. This is essential for the lexer to provide correct string representations of tokens when needed.


87-91: _SYMBOLIC_NAMES array updated with new symbolic names.

The _SYMBOLIC_NAMES array now includes the new symbolic names for the tokens. This update is crucial for mapping token types to their symbolic names during the lexing process.


124-256: Serialized ATN updated to match grammar changes.

The _serializedATN has been updated to reflect the changes in the grammar. This ensures that the lexer operates correctly with the updated set of tokens and grammar rules.

Comment on lines +52 to +58
fieldDefList
: fieldDef ((COMMA | SEMICOLON | NEWLINE) fieldDef)*
;

identifier: IDENTIFIER;

fieldDef: identifier QUESTION? ':' type;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider standardizing field separators

The 'fieldDefList' rule currently allows for multiple separators (comma, semicolon, newline) between field definitions. While this provides flexibility, it may lead to inconsistencies in schema definitions.

Consider standardizing on a single separator for clarity and consistency. For example:

fieldDefList
    : fieldDef (COMMA fieldDef)*
    ;

This change would enforce a consistent style across schema definitions.

antlr/Schema.g4 Show resolved Hide resolved
src/validator.ts Show resolved Hide resolved
antlr/SchemaListener.ts Show resolved Hide resolved
antlr/SchemaLexer.interp Show resolved Hide resolved
antlr/SchemaParser.ts Show resolved Hide resolved
antlr/SchemaParser.ts Show resolved Hide resolved
antlr/SchemaParser.ts Show resolved Hide resolved
antlr/SchemaParser.ts Show resolved Hide resolved
antlr/SchemaParser.ts Show resolved Hide resolved
@hackerwins
Copy link
Member

This was merged by #4.

@hackerwins hackerwins closed this Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants