-
Notifications
You must be signed in to change notification settings - Fork 1
feat: Implement CLP type parser to support special chars besides _ in field name (addresses #13).
#31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: presto-0.293-clp-connector
Are you sure you want to change the base?
Conversation
WalkthroughAdds a Flex/Bison CLP type parser under velox/type/fbclp: build integration (CMake), generated lexer/parser, scanner and utility code, public parse API, and unit tests, all gated by VELOX_ENABLE_CLP_CONNECTOR. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant API as parseClpType
participant Scanner as ClpScanner (Flex)
participant Parser as ClpParser (Bison)
participant Util as ClpParserUtil
participant Registry as Type registry
Client->>API: typeText
API->>Scanner: init streams, output TypePtr&, input view
API->>Parser: parse(Scanner*)
loop tokenization/parsing
Scanner-->>Parser: token + yylval
Parser->>Util: typeFromString / customTypeWithChildren / inferTypeWithSpaces
Util->>Registry: getType(name, params)
Registry-->>Util: TypePtr
Util-->>Parser: TypePtr
end
Parser-->>API: parsed TypePtr
API-->>Client: TypePtr or error
rect rgba(220,240,255,0.35)
note right of Parser: On parse failure, ClpParser::error reports via VELOX_UNSUPPORTED with input context.
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 20
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (10)
velox/type/CMakeLists.txt(1 hunks)velox/type/fbclp/CMakeLists.txt(1 hunks)velox/type/fbclp/ClpParserUtil.cpp(1 hunks)velox/type/fbclp/ClpParserUtil.h(1 hunks)velox/type/fbclp/ClpScanner.h(1 hunks)velox/type/fbclp/ClpTypeParser.h(1 hunks)velox/type/fbclp/ClpTypeParser.ll(1 hunks)velox/type/fbclp/ClpTypeParser.yy(1 hunks)velox/type/fbclp/tests/CMakeLists.txt(1 hunks)velox/type/fbclp/tests/ClpTypeParserTest.cpp(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
velox/type/fbclp/ClpTypeParser.h (2)
velox/type/fbclp/ClpParserUtil.h (1)
velox/type/fbclp/ClpScanner.h (1)
velox/type/fbclp/ClpParserUtil.h (3)
velox/type/fbclp/ClpScanner.h (1)
velox/type/fbclp/ClpTypeParser.h (1)
velox/type/fbclp/ClpParserUtil.cpp (6)
typeFromString(22-37)typeFromString(22-24)customTypeWithChildren(39-51)customTypeWithChildren(39-41)inferTypeWithSpaces(53-66)inferTypeWithSpaces(53-55)
velox/type/fbclp/tests/ClpTypeParserTest.cpp (1)
velox/python/type/type.pyi (2)
BIGINT(21-21)VARCHAR(29-29)
velox/type/fbclp/ClpScanner.h (2)
velox/type/fbclp/ClpParserUtil.h (1)
velox/type/fbclp/ClpTypeParser.h (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pre-commit
- GitHub Check: Build with GCC / Ubuntu release with resolve_dependency
🔇 Additional comments (5)
velox/type/fbclp/tests/CMakeLists.txt (1)
15-25: LGTM.Linking against velox_type_fbclp_parser and GTest targets looks correct. No issues.
velox/type/fbclp/ClpTypeParser.ll (1)
60-62: Keep the yyFlexLexer::yylex guard; OK as-is.Throwing here is standard to prevent accidental use of the base yylex.
velox/type/fbclp/ClpTypeParser.yy (2)
154-156: Good error propagation.Raising
VELOX_UNSUPPORTEDwith the original input is clear and consistent with Velox errors.
149-150: Confirmed — lexer returns quoted lexeme; parser stripping is correct.QUOTED_ID is defined in velox/type/fbclp/ClpTypeParser.ll (line 37) and the lexer action returns YYText() unchanged (line 54), so the parser's $1.erase(0,1); $1.pop_back(); in ClpTypeParser.yy is required and will not double-strip. The QUOTED_ID regex only allows alnum/space/underscore inside quotes, so there are no escape sequences to unescape; if you need support for escaped quotes/backslashes, extend the lexer regex and add unescaping in the parser.
velox/type/fbclp/CMakeLists.txt (1)
26-31: Confirm Flex/Bison symbol prefixing matches ClpScanner.Ensure velox/type/fbclp/ClpScanner.h (or the generated scanner header) contains
#define yyFlexLexer veloxtpclpFlexLexerbefore includingFlexLexer.hso the veloxtpclp-prefixed scanner matches the base class. Repository search found no such define — verify and add if missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 22
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (10)
velox/type/CMakeLists.txt(1 hunks)velox/type/fbclp/CMakeLists.txt(1 hunks)velox/type/fbclp/ClpParserUtil.cpp(1 hunks)velox/type/fbclp/ClpParserUtil.h(1 hunks)velox/type/fbclp/ClpScanner.h(1 hunks)velox/type/fbclp/ClpTypeParser.h(1 hunks)velox/type/fbclp/ClpTypeParser.ll(1 hunks)velox/type/fbclp/ClpTypeParser.yy(1 hunks)velox/type/fbclp/tests/CMakeLists.txt(1 hunks)velox/type/fbclp/tests/ClpTypeParserTest.cpp(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
velox/type/fbclp/ClpTypeParser.h (1)
velox/type/fbclp/ClpParserUtil.h (1)
velox/type/fbclp/ClpParserUtil.h (3)
velox/type/fbclp/ClpScanner.h (1)
velox/type/fbclp/ClpTypeParser.h (1)
velox/type/fbclp/ClpParserUtil.cpp (6)
typeFromString(22-37)typeFromString(22-24)customTypeWithChildren(39-51)customTypeWithChildren(39-41)inferTypeWithSpaces(53-66)inferTypeWithSpaces(53-55)
velox/type/fbclp/tests/ClpTypeParserTest.cpp (1)
velox/python/type/type.pyi (2)
BIGINT(21-21)VARCHAR(29-29)
velox/type/fbclp/ClpScanner.h (2)
velox/type/fbclp/ClpParserUtil.h (1)
velox/type/fbclp/ClpTypeParser.h (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: conventional-commits
- GitHub Check: pre-commit
- GitHub Check: Build with GCC / Ubuntu release with resolve_dependency
🔇 Additional comments (7)
velox/type/fbclp/tests/CMakeLists.txt (1)
15-25: Confirm tests are only added when VELOX_BUILD_TESTING and CLP are enabled.Looks good, but ensure the parent CMakeLists.txt adds this tests directory only when VELOX_BUILD_TESTING is ON and VELOX_ENABLE_CLP_CONNECTOR is enabled; otherwise CI systems that disable tests or CLP could see spurious targets.
velox/type/fbclp/ClpTypeParser.ll (3)
39-40: Order is fine; keep VARIABLE before WORD.Token priority is correct; no change needed.
60-63: yylex override is fine.Throwing on default yylex is acceptable to catch miswiring.
66-76: parseClpType implementation looks good; add defensive error context.Consider enriching the VELOX_CHECK failure with parser position when available, but current message is acceptable.
velox/type/fbclp/ClpTypeParser.yy (3)
97-99: varchar(var) length ignored by design — confirm this matches Velox semanticsYou parse
(NUMBER)but discard it. That aligns with existing Velox parsers; just confirm tests cover bothvarcharandvarchar(n).
83-91: Field names with special chars depend on the scanner — ensure unquoted -, @, #, $, \ are emitted as WORD/VARIABLE when allowedGrammar is ready (field_name accepts WORD), but support hinges on ClpTypeParser.ll. Please confirm tokens for unquoted identifiers with these chars are produced only where intended (e.g., not conflicting with minus).
Also applies to: 146-150
154-156: Good error surfacingClear context in errors via scanner->input(). LGTM.
|
Going to fix the coderabbitai comments first. |
|
Most of comments from coderabbitai are related to the code I coped from |
|
Fixed some of suggestions in CMake and comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
velox/type/fbclp/tests/ClpTypeParserTest.cpp (1)
26-39: Add negative and nested coverage (quoted/illegal-char/nested ROW/ARRAY/MAP).Strengthen regression guarantees with explicit failing and nested cases as previously suggested.
I can draft targeted TEST_F cases for:
- Quoted identifiers with spaces and special chars.
- An illegal unquoted char to assert a clean failure.
- Nested types with special-char field names.
Want me to push these?velox/type/fbclp/ClpTypeParser.h (2)
32-36: Clarify quoted identifier support in docs.State that quoted names accept any character except the quote delimiter.
Apply:
-/// Quoted field names are supported. +/// Quoted field names are supported; inside quotes any character is allowed +/// except the terminating quote delimiter.
19-21: Change parseClpType to take std::string_view; update generated parserUse std::string_view to avoid copies and match ClpScanner. In velox/type/fbclp/ClpTypeParser.h add #include <string_view> and change
TypePtr parseClpType(const std::string& typeText);
to
TypePtr parseClpType(std::string_view typeText);
Also update velox/type/fbclp/ClpTypeParser.ll (or regenerate from ClpTypeParser.yy) so the definition matches, and ensure the implementation does not store/assume ownership of the incoming characters (lifetime safety). Call sites found: velox/type/fbclp/tests/ClpTypeParserTest.cpp (no change required for string literals).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
velox/type/CMakeLists.txt(1 hunks)velox/type/fbclp/ClpTypeParser.h(1 hunks)velox/type/fbclp/tests/ClpTypeParserTest.cpp(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
velox/type/fbclp/ClpTypeParser.h (2)
velox/type/fbclp/ClpParserUtil.h (1)
velox/type/fbclp/ClpScanner.h (1)
velox/type/fbclp/tests/ClpTypeParserTest.cpp (1)
velox/python/type/type.pyi (2)
BIGINT(21-21)VARCHAR(29-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build with GCC / Ubuntu release with resolve_dependency
🔇 Additional comments (1)
velox/type/CMakeLists.txt (1)
21-23: Approve — VELOX_ENABLE_CLP_CONNECTOR is defined and fbclp gating is correct.
Option defined at CMakeLists.txt:130 (default ON); gating present in velox/type/CMakeLists.txt:21 — prevents accidental toolchain requirements and keeps non‑CLP builds clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
velox/type/fbclp/tests/ClpTypeParserTest.cpp (1)
26-39: Add negative, quoted-with-spaces, and nested coverage.These cement lexer/parser behaviour and guard against regressions.
Apply:
@@ TEST_F(TypeParserTest, rowTypeWithSpecialChars) { @@ } +TEST_F(TypeParserTest, quotedIdentifiersAndWhitespace) { + ASSERT_EQ( + *parseClpType(R"(row("hy-phen" varchar,"sp ace" varchar,"$$" varchar))"), + *ROW({"hy-phen", "sp ace", "$$"}, {VARCHAR(), VARCHAR(), VARCHAR()})); +} + +TEST_F(TypeParserTest, nestedTypesWithSpecialCharNames) { + ASSERT_EQ( + *parseClpType( + R"(row("$root$" row("-arr-" array(varchar), "#m#" map(varchar, bigint))))"), + *ROW( + {"$root$"}, + {ROW({"-arr-", "#m#"}, {ARRAY(VARCHAR()), MAP(VARCHAR(), BIGINT())})})); +} + +TEST_F(TypeParserTest, rejectsIllegalIdentifierChar) { + // '!' is still illegal; ensure fast failure. + VELOX_ASSERT_THROW(parseClpType(R"(row(inva!id bigint))"), "invalid"); +}
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
velox/type/fbclp/tests/ClpTypeParserTest.cpp(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
velox/type/fbclp/tests/ClpTypeParserTest.cpp (1)
velox/python/type/type.pyi (2)
BIGINT(21-21)VARCHAR(29-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build with GCC / Ubuntu release with resolve_dependency
🔇 Additional comments (2)
velox/type/fbclp/tests/ClpTypeParserTest.cpp (2)
26-33: Good unquoted coverage for -, @, #, $, and .This locks in the lexer change for unquoted identifiers and matches expected field names/types.
26-39: Confirm additional parser tests exist (quoted / whitespace / nested / negative cases)Automated search found no matches under velox/type/fbclp/tests — confirm these tests were added in another commit/location or add them under velox/type/fbclp/tests (see ClpTypeParserTest.cpp lines 26–39).
|
Ready fto review. |
|
"addresses" is not a keyword that GitHub recognizes: https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword |
Description
The corresponding Presto PR: y-scope/presto#67
This PR addressed #13 .
This PR copied the original type parser so that we could do some customization for CLP connector without affecting other connectors. The current customization we have made is adding special chars including
-,@,#,$and\.Then we can use this new parser in the protocol instead of using original parser which cannot recogonize those speical chars.
The protocol PR: y-scope/presto#67
Checklist
breaking change.
Validation performed
Added unit tests.
Passed the CI.
Passed the end-to-end testing. (t."$date" is querable).
Summary by CodeRabbit
New Features
Build
Tests