Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Like function: Error while using double escape char in pattern string #8040

Closed

Conversation

zhztheplayer
Copy link
Contributor

Expression:

like(R"(abcdef\abcdef)", R"(%\\abc%)", '\\')

Error:

Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Escape character must be followed by '%', '_' or the escape character itself
Retriable: False
Expression: current == escapeChar || current == '_' || current == '%'
Function: unescape
File: ../../velox/functions/lib/Re2Functions.cpp

The bug is a regression caused by #7730

Copy link

netlify bot commented Dec 14, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit fcf27fd
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/657af6e42cbb3f000865f0b6

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 14, 2023
@zhztheplayer
Copy link
Contributor Author

Hi @mbasmanova and @xumingming, please take a look. Thanks!

zhztheplayer added a commit to zhztheplayer/velox that referenced this pull request Dec 14, 2023
@@ -1170,7 +1176,16 @@ PatternMetadata determinePatternKind(
} else {
// Record the first fixed pattern start.
if (fixedPatternStart == -1) {
fixedPatternStart = iterator.currentIndex();
if (iterator.isEscaped()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Thank you for the fix.

currentIndex() method and this logic beaks the abstraction, the promise that iterator handles escape characters transparently.

Can we maybe replace currentIndex() with currentStart() and nextStart() methods that return the start position of the current logical character and start position of the next logical character. Or something similar that doesn't require the caller to handle escape character explicitly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhztheplayer thanks for the catch!

Um, just like @mbasmanova pointed out, the key here is currentIndex() breaks the abstraction, current simple way of parsing the pattern is rigid so it depends on currentIndex(). I have an incoming PR which will support optimizations for more relaxed pattern like _a_b% which will require more flexible parsing of the pattern. In that PR I remove the dependency on currentIndex(), just rely on current() to tell me what current char is, so after collecting all the chars from current(), it is the unescaped string we are looking for.

So my suggestion is:

  • I submit a PR to disable the optimization for escaped scenario for the moment.
  • In my PR to support relaxed pattern, fix this issue thoroughly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion.

I looked at the code but found that we don't appear to require for nextStart(), currentStart() could be enough. Let's only add one for now to keep the code simple?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Thanks.

Copy link
Contributor Author

@zhztheplayer zhztheplayer Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xumingming Hi Mingming, sorry for missing your message. So would you mean to fix the issue in your incoming PR instead? If yes, it should be OK to me but the regression blocks our daily testing which is a little bit hurry for us to have a fix landed. Is it possible to use this PR then rebase your patch onto it, if it works for you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhztheplayer Sure, go ahead.


int32_t currentIndex_{-1};
size_t currentStart_{0};
size_t nextStart_{0};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nextStart_ is just a internal state

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.


// Index of the first % or _ character(not escaped).
int32_t wildcardStart = -1;
size_t wildcardStart = -1;
Copy link
Contributor

@xumingming xumingming Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wildcardStart is an index, if it is not found, it could be negative(-1), while size_t is unsigned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'll change it back to int32_t. Just missed that the values are -1s.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in 840aefd.

Copy link

Conbench analyzed the 1 benchmark run on commit 840aefd4.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants