-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore time zones that are not recognizable by OS when building time zone database #10654
Conversation
✅ Deploy Preview for meta-velox canceled.
|
velox/type/tz/TimeZoneMap.cpp
Outdated
} catch (std::runtime_error& err) { | ||
// Timezone not found in OS, skip it. | ||
LOG(WARNING) << "Timezone [" << entry.second << "] not found due to: '" | ||
<< err.what() << "', ignoring it. "; | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pedroerp Would you think it's OK to modify tz.h
/ tz.cpp
? Since perhaps we could add a new function, like bool has_zone(string tz_name)
to avoid catching a runtime_error
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We made some other changes to that library, so it should be ok. Just make sure you capture the changes in a .patch file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what happens when API clients try to convert a timezone not found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what happens when API clients try to convert a timezone not found?
Do you mean when e.g., calling APIs in TimeZoneMap.h
?
In that case an Exception: VeloxUserError
like the following one will be raised
unknown file: Failure
C++ exception with description "Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Unknown time zone: 'America/Non_Exist'
Retriable: False
Function: locateZone
File: /opt/gluten/ep/build-velox/build/velox_ep/velox/type/tz/TimeZoneMap.cpp
Line: 276
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhztheplayer there are two different things in this API:
- The pointer to date::time_zone*, which does the actual time zone conversion.
- The TimeZone object, which carries the date::time_zone* pointer above.
These two used to be coupled, so you could only have 2 with 1, and the TimeZone class has an internal assumption that if 1 is nullptr, it goes ahead and makes the to_sys() and to_local() based on offsets.
Now, with this PR, we are making a change where we allow 2 to be created without 1. We need to have additional logic to prevent one from making time zone conversion if the internal date::time_zone pointer doesn't exist. We still need to allow 1 to exist without 2 if we want to support missing timezone in the local tzdata database, but we need to add code to prevent time zone conversion to happen using these "empty" TimeZone objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation. Which helps me understand the code's history a lot.
So, IIUC, one of the intentions of #10577 was to allow Velox convert time zones that don't exist in OS's timezone list, so we'd rather make the convertion continue than throw an error, am I understanding correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the worst behaviour in Gluten before #10577:
- During native library is loaded (process startup):
No error
- Converting an non-existing time zone:
Error
The behaviour in Gluten after #10577:
- During native library is loaded (process startup):
Error
(some zones don't exist in local tz list) - Converting an non-existing time zone:
N/A
This PR so far:
- During native library is loaded (process startup):
No error
(warning instead) - Converting an non-existing time zone:
Error
So I think we should make both 1 and 2 pass without throwing, which was the intention of #10577 and relevant issues, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, IIUC, one of the intentions of #10577 was to allow Velox convert time zones that don't exist in OS's timezone list, so we'd rather make the convertion continue than throw an error, am I understanding correctly?
No, these are different. The PR above is to support conversions using fixed timezone offsets (things like "+09:00" and "-03:00"). These also don't exist in the time zone database, but they are different from a real time zone (say, "America/Los_Angeles") which may happen not to be available in the time zone database. For the latter case, we need to fail the conversion since we don't know the time zone potential daylight savings schedule.
970412c
to
88538f2
Compare
I've updated the PR, CI error is unrelated I guess. @pedroerp Thanks |
kindly ping @pedroerp in case you missed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when you try to use a timezone that could not be loaded? Could you double check if we have an assertion that makes it fail with a descriptive message, and prevent the binary from just crashing?
: public std::runtime_error | ||
{ | ||
public: | ||
invalid_timezone(const std::string& tz_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not just defining the body here inline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was just following the file's code style. tz.h
doesn't likely adopt inline definitions.
velox/type/tz/TimeZoneMap.cpp
Outdated
} catch (date::invalid_timezone& err) { | ||
// Timezone not found in OS, skip it. | ||
LOG(WARNING) << "Timezone [" << entry.second << "] not found due to: '" | ||
<< err.what() << "', ignoring it. "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concatenating these two messages will read weird. What about something like:
Unable to load "[timezone]" from local timezone database: error_msg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There are two cases we need to make sure we cover:
2 wouldn't happen before your PR, so it's likely that we don't have check for it. We will need to check both locateZone() functions, the one that searches based on a time zone name, and the one based on a time zone ID. |
@pedroerp Did you mean to add a test to emulate the case that I described in PR description? If yes I've added one in the latest commits. |
@pedroerp I would like to revisit this and appreciate for your help in advance.
If by velox/velox/type/tz/TimeZoneMap.cpp Lines 74 to 85 in 1b64d39
There is a Are we aligned here so far? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhztheplayer I'm just coming back from PTO, sorry about the delay on this.
Thank you for the iterations. Your comment is right, thanks for clarifying. Once we address the two small comments we can tag it as ready to merge.
// Timezone not found in OS, skip it. | ||
LOG(WARNING) << "Unable to load [" << entry.second | ||
<< "] from local timezone database: " << err.what(); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhztheplayer You're right, in this case we leave the entire TimeZone pointers as null, so it will do the right thing. Thanks for clarifying.
It would probably be worth it to add a small comment here explaining what in this case we continue the iteration and leave the pointer unchanged (nullptr), for the next unadvised reader like myself :)
velox/type/tz/TimeZoneMap.cpp
Outdated
@@ -39,6 +42,12 @@ inline std::chrono::minutes getTimeZoneOffset(int16_t tzID) { | |||
return std::chrono::minutes{(tzID <= 840) ? (tzID - 841) : (tzID - 840)}; | |||
} | |||
|
|||
const date::time_zone* locateZoneExt(std::string_view tz_name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency with other parts of the codebase, maybe call it locateZoneImpl()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
Thanks but UT had failures and I'll have another check on that. |
f1b814a
to
e9f3be1
Compare
Fixed now, was a rebase issue. |
@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
…zone database (facebookincubator#10654) Summary: Related to discussion facebookincubator#10577 (comment) The patch fixes fatal error `<zone id> not found in timezone database` when at least one of the timezone IDs in [file](https://github.com/facebookincubator/velox/blob/main/velox/type/tz/TimeZoneDatabase.cpp) don't exist in OS's supported timezone list. Pull Request resolved: facebookincubator#10654 Reviewed By: kgpai Differential Revision: D62785133 Pulled By: pedroerp fbshipit-source-id: c8454454750040f8fdcbe053084f505d679f3142
Related to discussion #10577 (comment)
The patch fixes fatal error
<zone id> not found in timezone database
when at least one of the timezone IDs in file don't exist in OS's supported timezone list.