-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamp Timezone Processing and Management #8037
Comments
Hi @pedroerp, does Velox's external/date come from this lib: https://github.com/HowardHinnant/date? Not sure whether it can understand more timezone patterns in the latest versions. In addition, I note USE_OS_TZDB is used to build this external/lib. (https://github.com/facebookincubator/velox/blob/main/velox/external/date/CMakeLists.txt#L14), which makes this library only use OS-supplied timezone database. It may cause a few limitations. |
@PHILO-HE Yes, it was copied from that repo.
Which limitations you have in mind? The OS timezone database should be our source of authority for timezone conversion, IMO. |
Hi @pedroerp, I investigated the external date lib recently. In the previous discussion, I guessed OS timezone database has some limitation, so it cannot recognize offset-based timezone like +08:00. But even though I set AUTO_DOWNLOAD=1, HAS_REMOTE_API=1 for this lib to use downloaded timezone database, such offset-based timezone still cannot be recognized (See PHILO-HE@b98d593). I submitted an issue to that community to discuss offset-based timezone: HowardHinnant/date#823. It seems we can firstly convert the offset to "Etc/GMT+x" (only workable for offset who's a multiple of hours). |
@PHILO-HE @pedroerp That's quite interesting - maybe implementing such a conversion can be a good incremental first step? As in, a function to convert whole-hour offsets to "Etc/GMT+x", and use a custom time zone class with minute precision, as outlined in the documentation, for the other cases. I could start putting that together if we want to go down that route. |
Hi @svm1, I have a PR to support timezone like "GMT+8". See #9591. We can do the similar conversion for other whole-hour offsets, like "+08:00", in separate PR. |
Sorry, I must've missed that. Makes sense! |
Description
There are a few important aspects regarding timezone processing in Velox:
In order to efficiently represent timezones in Velox, timezones are mapped to 16 bit integers. Because the mappings should be compatible with data saved on disk by Presto java (e.g TimestampWithTimezone type), we import the mapping between timezone name (string) and integers from Presto. The mapping is automatically generated and available here:
Naturally, both sides of the mapping are available (name ->id; id -> name):
Many functions use timezone information to perform timestamp/date/time conversion across timezones. This is done using an external library available here:
This library is compatible with the new timezone std::chrono standard in C++20:
It takes a string with the timezone name and leverages the local tzdata package to perform timezone conversions.
===
There are two problems with the current setup:
A) These two mappings are not synchronized. Timezone names available in the Presto mapping (item 1 above) are not necessarily understood by external/date. This means that in some cases timezone which we can represent, fail if we try to perform conversions (#7804 is an example).
B) Inefficient conversion. For timezones that happen to be available in both mappings, the common conversion code consists of (a) first mapping the stored integer id back to the string name, (b) using the string name to find the pointer to the correct conversion object, then (c) actually performing the conversion.
format_datetime()
is an example:===
To address the issues above, we should consider:
X) Ensure every timezone name supported in Velox (the ones in the Presto-based mapping) have a corresponding entry in external/date. We should have tests to cover every timezone. There is a good overlap, but they are not the same. Particularly, fixed offsets like "03:00" and "-07:30" are supported by Presto but not by external/date.
Y) Provide a cached mapping from "id" -> "timezone object pointer" to improve efficiency in the common conversion path.
Z) Work with the Presto community to add missing timezone names - it was reported that at least EST, CST, and similar are official timezone and available in tzdata/tzdump, but not supported by Presto (and hence not available in Velox).
Cc: @mbasmanova @PHILO-HE @aditi-pandit @majetideepak @zacw7 @tanjialiang @gggrace14 @svm1
The text was updated successfully, but these errors were encountered: