-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for time zone offsets to TimeZone #10577
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,15 +25,20 @@ | |
|
||
namespace facebook::velox::tz { | ||
|
||
using TTimeZoneDatabase = std::vector<std::unique_ptr<TimeZone>>; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mbasmanova I ended up merging both PRs into one; this is the final state as I mentioned in the previous review. |
||
using TTimeZoneIndex = folly::F14FastMap<std::string, const TimeZone*>; | ||
|
||
// Defined in TimeZoneDatabase.cpp | ||
extern const std::vector<std::pair<int16_t, std::string>>& getTimeZoneEntries(); | ||
|
||
// TODO: The string will be moved to TimeZone in the next PR. | ||
using TTimeZoneDatabase = std::vector<std::unique_ptr<std::string>>; | ||
using TTimeZoneIndex = folly::F14FastMap<std::string, int16_t>; | ||
|
||
namespace { | ||
|
||
// Returns the offset in minutes for a specific time zone offset in the | ||
// database. Do not call for tzID 0 (UTC / "+00:00"). | ||
inline std::chrono::minutes getTimeZoneOffset(int16_t tzID) { | ||
return std::chrono::minutes{(tzID <= 840) ? (tzID - 841) : (tzID - 840)}; | ||
} | ||
|
||
// Flattens the input vector of pairs into a vector, assuming that the | ||
// timezoneIDs are (mostly) sequential. Note that since they are "mostly" | ||
// senquential, the vector can have holes. But it is still more efficient than | ||
|
@@ -44,7 +49,23 @@ TTimeZoneDatabase buildTimeZoneDatabase( | |
tzDatabase.resize(dbInput.back().first + 1); | ||
|
||
for (const auto& entry : dbInput) { | ||
tzDatabase[entry.first] = std::make_unique<std::string>(entry.second); | ||
std::unique_ptr<TimeZone> timeZonePtr; | ||
|
||
if (entry.first == 0) { | ||
timeZonePtr = std::make_unique<TimeZone>( | ||
"UTC", entry.first, date::locate_zone("UTC")); | ||
} else if (entry.first <= 1680) { | ||
std::chrono::minutes offset = getTimeZoneOffset(entry.first); | ||
timeZonePtr = | ||
std::make_unique<TimeZone>(entry.second, entry.first, offset); | ||
} | ||
// Every single other time zone entry (outside of offsets) needs to be | ||
// available in external/date or this will throw. | ||
else { | ||
timeZonePtr = std::make_unique<TimeZone>( | ||
entry.second, entry.first, date::locate_zone(entry.second)); | ||
} | ||
Comment on lines
+62
to
+67
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems that I am not sure but probably this change causes Gluten's CentOS 8 tests to throw
Can we loose the restriction here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @zhztheplayer I guess you are using an outdated version of the tzdata database in your system. Could you update it? Which version are you using? You can check with something like
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's by default [root@47881716f3de /]# rpm -qa | grep tzdata
tzdata-2021a-1.el8.noarch
[root@47881716f3de /]# ls /usr/share/zoneinfo/Europe/
Amsterdam Belfast Brussels Chisinau Guernsey Jersey Lisbon Madrid Monaco Paris Rome Saratov Stockholm Ulyanovsk Vienna Zagreb
Andorra Belgrade Bucharest Copenhagen Helsinki Kaliningrad Ljubljana Malta Moscow Podgorica Samara Simferopol Tallinn Uzhgorod Vilnius Zaporozhye
Astrakhan Berlin Budapest Dublin Isle_of_Man Kiev London Mariehamn Nicosia Prague San_Marino Skopje Tirane Vaduz Volgograd Zurich
Athens Bratislava Busingen Gibraltar Istanbul Kirov Luxembourg Minsk Oslo Riga Sarajevo Sofia Tiraspol Vatican Warsaw There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, I suppose a simple update to the tzdata package should solve it. We could also do what you suggested above; allow timezone without a tzdata database, and throw at runtime if they try to be used. If you think that's useful, could you open an Issue? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. And we have been using a simple fix for the issue in Gluten. I can open a PR using that code. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
tzDatabase[entry.first] = std::move(timeZonePtr); | ||
} | ||
return tzDatabase; | ||
} | ||
|
@@ -59,14 +80,19 @@ const TTimeZoneDatabase& getTimeZoneDatabase() { | |
// reverse look ups. | ||
TTimeZoneIndex buildTimeZoneIndex(const TTimeZoneDatabase& tzDatabase) { | ||
TTimeZoneIndex reversed; | ||
reversed.reserve(tzDatabase.size() + 1); | ||
reversed.reserve(tzDatabase.size() + 2); | ||
|
||
for (int16_t i = 0; i < tzDatabase.size(); ++i) { | ||
if (tzDatabase[i] != nullptr) { | ||
reversed.emplace(boost::algorithm::to_lower_copy(*tzDatabase[i]), i); | ||
reversed.emplace( | ||
boost::algorithm::to_lower_copy(tzDatabase[i]->name()), | ||
tzDatabase[i].get()); | ||
} | ||
} | ||
reversed.emplace("utc", 0); | ||
|
||
// Add aliases to UTC. | ||
reversed.emplace("+00:00", tzDatabase.front().get()); | ||
reversed.emplace("-00:00", tzDatabase.front().get()); | ||
return reversed; | ||
} | ||
|
||
|
@@ -157,10 +183,10 @@ std::string getTimeZoneName(int64_t timeZoneID) { | |
timeZoneDatabase[timeZoneID], | ||
"Unable to resolve timeZoneID '{}'", | ||
timeZoneID); | ||
return *timeZoneDatabase[timeZoneID]; | ||
return timeZoneDatabase[timeZoneID]->name(); | ||
} | ||
|
||
int16_t getTimeZoneID(std::string_view timeZone, bool failOnError) { | ||
const TimeZone* locateZone(std::string_view timeZone, bool failOnError) { | ||
const auto& timeZoneIndex = getTimeZoneIndex(); | ||
|
||
std::string timeZoneLowered; | ||
|
@@ -177,10 +203,16 @@ int16_t getTimeZoneID(std::string_view timeZone, bool failOnError) { | |
if (it != timeZoneIndex.end()) { | ||
return it->second; | ||
} | ||
|
||
if (failOnError) { | ||
VELOX_USER_FAIL("Unknown time zone: '{}'", timeZone); | ||
} | ||
return -1; | ||
return nullptr; | ||
} | ||
|
||
int16_t getTimeZoneID(std::string_view timeZone, bool failOnError) { | ||
const TimeZone* tz = locateZone(timeZone, failOnError); | ||
return tz == nullptr ? -1 : tz->id(); | ||
} | ||
|
||
int16_t getTimeZoneID(int32_t offsetMinutes) { | ||
|
@@ -209,8 +241,37 @@ int16_t getTimeZoneID(int32_t offsetMinutes) { | |
} | ||
} | ||
|
||
const TimeZone* locateZone(std::string_view timeZone) { | ||
return date::locate_zone(timeZone); | ||
TimeZone::seconds TimeZone::to_sys( | ||
TimeZone::seconds timestamp, | ||
TimeZone::TChoose choose) const { | ||
date::local_seconds timePoint{timestamp}; | ||
|
||
if (tz_ == nullptr) { | ||
// We can ignore `choose` as time offset conversions are always linear. | ||
return (timePoint - offset_).time_since_epoch(); | ||
} | ||
|
||
if (choose == TimeZone::TChoose::kFail) { | ||
// By default, throws. | ||
return date::zoned_time{tz_, timePoint}.get_sys_time().time_since_epoch(); | ||
} | ||
|
||
auto dateChoose = (choose == TimeZone::TChoose::kEarliest) | ||
? date::choose::earliest | ||
: date::choose::latest; | ||
return date::zoned_time{tz_, timePoint, dateChoose} | ||
.get_sys_time() | ||
.time_since_epoch(); | ||
} | ||
|
||
TimeZone::seconds TimeZone::to_local(TimeZone::seconds timestamp) const { | ||
date::sys_seconds timePoint{timestamp}; | ||
|
||
// If this is an offset time zone. | ||
if (tz_ == nullptr) { | ||
return (timePoint + offset_).time_since_epoch(); | ||
} | ||
return date::zoned_time{tz_, timePoint}.get_local_time().time_since_epoch(); | ||
} | ||
|
||
} // namespace facebook::velox::tz |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,7 @@ | |
|
||
#pragma once | ||
|
||
#include <chrono> | ||
#include <string> | ||
|
||
namespace facebook::velox::date { | ||
|
@@ -24,28 +25,25 @@ class time_zone; | |
|
||
namespace facebook::velox::tz { | ||
|
||
/// This library provides time zone lookup and mapping utilities, in addition to | ||
/// functions to enable timestamp conversions across time zones. It leverages | ||
/// the velox/external/date underneath to perform conversions. | ||
/// This library provides time zone management primitives. It maintains an | ||
/// internal static database which is contructed lazily based on the first | ||
/// access, based on TimeZoneDatabase.cpp and the local tzdata installed in your | ||
/// system (through velox/external/date). | ||
/// | ||
/// This library provides a thin layer of functionality on top of | ||
/// velox/external/date for timezone lookup and conversions, so don't use the | ||
/// external library directly. | ||
|
||
/// TimeZone is the object that allows conversions across timezones using the | ||
/// .to_sys() and .to_local() methods, as documented in: | ||
/// | ||
/// https://howardhinnant.github.io/date/tz.html | ||
/// It provides functions for one to lookup TimeZone pointers based on time zone | ||
/// name or ID, and to performance timestamp conversion across time zones. | ||
/// | ||
using TimeZone = date::time_zone; | ||
/// This library provides a layer of functionality on top of | ||
/// velox/external/date, so do not use the external library directly for | ||
/// time zone routines. | ||
|
||
/// TimeZone pointers can be found using `locateZone()`. | ||
/// | ||
/// This function in mostly implemented by velox/external/date, and performs a | ||
/// binary search in the internal time zone database. On the first call, | ||
/// velox/external/date will initialize a static list of timezone, read from the | ||
/// local tzdata database. | ||
const TimeZone* locateZone(std::string_view timeZone); | ||
class TimeZone; | ||
|
||
/// Looks up a TimeZone pointer based on a time zone name. This makes an hash | ||
/// map access, and will construct the index on the first access. `failOnError` | ||
/// controls whether to throw or return nullptr in case the time zone was not | ||
/// found. | ||
const TimeZone* locateZone(std::string_view timeZone, bool failOnError = true); | ||
|
||
/// Returns the timezone name associated with timeZoneID. | ||
std::string getTimeZoneName(int64_t timeZoneID); | ||
|
@@ -59,6 +57,89 @@ int16_t getTimeZoneID(std::string_view timeZone, bool failOnError = true); | |
/// [-14:00, +14:00] range. | ||
int16_t getTimeZoneID(int32_t offsetMinutes); | ||
|
||
/// TimeZone is the proxy object for time zone management. It provides access to | ||
/// time zone names, their IDs (as defined in TimeZoneDatabase.cpp and | ||
/// consistent with Presto), and utilities for timestamp conversion across | ||
/// timezones by leveraging the .to_sys() and .to_local() methods as documented | ||
/// in: | ||
/// | ||
/// https://howardhinnant.github.io/date/tz.html | ||
/// | ||
/// Do not create your own objects; rather, look up a pointer by using one of | ||
/// the methods above. | ||
class TimeZone { | ||
public: | ||
// Constructor for regular time zones with a name and a pointer to | ||
// external/date time zone database (from tzdata). | ||
TimeZone( | ||
std::string_view timeZoneName, | ||
int16_t timeZoneID, | ||
const date::time_zone* tz) | ||
: tz_(tz), | ||
offset_(0), | ||
timeZoneName_(timeZoneName), | ||
timeZoneID_(timeZoneID) {} | ||
|
||
// Constructor for time zone offsets ("+00:00"). | ||
TimeZone( | ||
std::string_view timeZoneName, | ||
int16_t timeZoneID, | ||
std::chrono::minutes offset) | ||
: tz_(nullptr), | ||
offset_(offset), | ||
timeZoneName_(timeZoneName), | ||
timeZoneID_(timeZoneID) {} | ||
|
||
// Do not copy it. | ||
TimeZone(const TimeZone&) = delete; | ||
TimeZone& operator=(const TimeZone&) = delete; | ||
|
||
using seconds = std::chrono::seconds; | ||
|
||
/// Converts a local time (the time as perceived in the user time zone | ||
/// represented by this object) to a system time (the corresponding time in | ||
/// GMT at the same instant). | ||
/// | ||
/// Conversions from local time to GMT are non-linear and may be ambiguous | ||
/// during day light savings transitions, or non existent. By default (kFail), | ||
/// `to_sys()` will throw `date::ambiguous_local_time` and | ||
/// `date::nonexistent_local_time` in these cases. | ||
/// | ||
/// You can overwrite the behavior in ambiguous conversions by setting the | ||
/// TChoose flag, but it will still throws in case of nonexistent conversions. | ||
enum class TChoose { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. need to define our own here so we don't introduce a header to header dependency |
||
kFail = 0, | ||
kEarliest = 1, | ||
kLatest = 2, | ||
}; | ||
|
||
seconds to_sys(seconds timestamp, TChoose choose = TChoose::kFail) const; | ||
|
||
/// Do the opposite conversion. Taking a system time (the time as perceived in | ||
/// GMT), convert to the same instant in time as observed in the user local | ||
/// time represented by this object). Note that this conversion is not | ||
/// susceptible to the error above. | ||
seconds to_local(seconds timestamp) const; | ||
|
||
const std::string& name() const { | ||
return timeZoneName_; | ||
} | ||
|
||
int16_t id() const { | ||
return timeZoneID_; | ||
} | ||
|
||
const date::time_zone* tz() const { | ||
return tz_; | ||
} | ||
|
||
private: | ||
const date::time_zone* tz_{nullptr}; | ||
const std::chrono::minutes offset_{0}; | ||
const std::string timeZoneName_; | ||
const int16_t timeZoneID_; | ||
}; | ||
|
||
} // namespace facebook::velox::tz | ||
|
||
#ifdef VELOX_ENABLE_BACKWARD_COMPATIBILITY | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to break the header dependency