-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(function): Handle unescaped UTF-8 characters in Presto url_extra…
…ct_* UDFs (#11535) Summary: Presto Java supports UTF-8 characters that are not control or whitespace characters appearing anywhere in a URL where a % escaped character can appear. This change modifies Velox's URIParser to do the same. Velox's URIParser would produce incorrect results when any non-ASCII character appeared anywhere in the URL and this has been fixed as well. In order to facilitate this I modified the tryGetCharLength helper function in UTF8Utils to take in a int32_t reference which it populates with the code point if the UTF-8 character is valid. It was already calculating this value and throwing it away, returning it allows me to avoid an additional call to repeat those steps and is consistent with the Airlift function on which it's based. Reviewed By: xiaoxmeng, kgpai Differential Revision: D65927918
- Loading branch information
1 parent
f9cbfd0
commit 6fe67de
Showing
8 changed files
with
176 additions
and
66 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.