-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use timezone-aware objects [mandate UTC in low-level vars for code safety, using Python 3.2's datetime.now(timezone.utc)
] [should Python 3.3's time.monotonic()
be used to approximate elapsed time?]
#217
base: master
Are you sure you want to change the base?
Conversation
Underlying design discussion (likely still evolving, towards a KISS design decision we hope!) is here — to give others much more context: 🙏 |
|
I am not sure I understand you fully. Does this explanation from Gemini fit for this discussion? Is the following snippet + result what you expected?
Current UTC Time: 2024-07-06 23:36:38.923000+00:00 |
Ok, I agree that storing
...compared to recommendations from prior years, below, which (until Python 3.12!) would have worked fine...
RECAP: Sounds like we should retitle this PR... clarifying that another/core purpose of this PR is to modernize Python code to be compatible with Python 3.12+ ? |
datetime.now(timezone.utc)
]
@@ -231,7 +231,8 @@ def error(self, x): | |||
|
|||
@property | |||
def runtime(self): | |||
return (self.end_time or datetime.now()) - self.start_time | |||
now = datetime.now(timezone.utc) | |||
return (self.end_time or now) - self.start_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are start_time and end_time exclusively used to compute the elapsed time in seconds?
If so, you don't need to use datetime objects, which are complex classes meant to represent calendar dates and localized time.
Internally, Linux and Windows represent time in (fractions of) seconds since a certain epoch. This system clock is the source for datetime objects, but you can also use it directly whenever you want to compute the delta:
>>> import time
>>> start_time = time.time()
>>> start_time
1720312378.525805
>>> type(start_time)
<class 'float'>
>>>
>>> end_time = time.time()
>>> end_time
1720312401.982123
>>> elapsed = end_time - start_time
>>> elapsed
23.456317901611328
>>> print(f"This example took exactly {elapsed} seconds to type, but we don't know when it started and ended in your timezone.")
This example took exactly 23.456317901611328 seconds to type, but we don't know when it started and ended in your timezone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important caveat from the docs of time.time():
While this function normally returns non-decreasing values, it can return a lower value than a previous call if the system clock has been set back between the two calls.
This problem also affects datetime.datetime.now(datetime.timezone.utc)
which you're using now, because the time can jump due to NTP, daylight saving or the system administrator simply setting the clock.
A more robust way to compute time differences is using time.monotonic()
, which exists since Python 3.3:
>>> start_time = time.monotonic()
>>> end_time = time.monotonic()
>>> elapsed = end_time - start_time
>>> elapsed
2.559895193000557
>>> type(elapsed)
<class 'float'>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Very useful knowledge: thanks @codewiz.
-
Maybe everyone sleep on this overnight to consider whether
datetime.now(timezone.utc)
IS-OR-IS-NOT appropriate for these Prevent [upcoming and truly live YouTube] videos from carrying over [to] subsequent download tasks [& discussion of UTC for code safety — avoid TZ code complexity wherever possible!?] #212 use cases ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time can jump due to [1] NTP, [2] daylight saving or [3] the system administrator simply setting the clock
@deldesir above is a good reminder that mandating UTC is a "step forward" — but possibly might not be quite enough... ?
- Mandating "UTC as perceived by the local machine" when recording time snapshots... does eliminate 1 of 3 ways that time/clock can go backwards (i.e. protecting against impromptu daylight savings time changes!) ✅
- BUT, the other 2 ways that time/clock can spontaneously go backwards (NTP and sysadmin simply setting the clock) remain risks/considerations we need to at least think through — do we really want to live with these 2 risks — or is
time.monotonic()
(or similar) better for some of these uses cases?⚠️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are start_time and end_time exclusively used to compute the elapsed time in seconds?
No. They're mainly to render the localized task status based on the runtime per https://github.com/iiab/calibre-web/blob/master/cps/tasks_status.py
If so, you don't need to use datetime objects, which are complex classes meant to represent calendar dates and localized time.
Internally, Linux and Windows represent time in (fractions of) seconds since a certain epoch. This system clock is the source for datetime objects, but you can also use it directly whenever you want to compute the delta:
I agree elapsed time computation can and must be simplified, however we're dealing with a codebase in which mainly naive datetime objects are used. Just lookup datetime.nowutc
and see for yourself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Mandating "UTC as perceived by the local machine" when recording time snapshots... does eliminate 1 of 3 ways that time/clock can go backwards (i.e. protecting against impromptu daylight savings time changes!) ✅
Also be aware of a 4th way IIAB system clock can change (go forward but not backwards!) rather automatically:
(e.g. when someone browses IIAB from a nearby phone / mobile device in a school!)
Summary of above PR from May 2019:
the apache_allow_sudo flag must be true (set with local vars) and the menu config must have allow_server_time_update true (set through the admin console)
if a device is mobile, as determined by the detectMobile function or has win64 or macintel in its device string then call updateServerTime
updateServerTime calls a php script that checks to see if the time has been set with NTP and if so returns. It then checks to see if the UTC version of the system time is more than 2 minutes behind the browser's UTC time and if so it uses date to set the clock to the browser's UTC time and fake-hwclock to save it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree elapsed time computation can and must be simplified
We're all aware of the counterargument of course:
- Curatorial use cases almost always require that IIAB be online — so relying on (what system clock claims to be) UTC might possibly be good enough (??)
- So, taking a step back to articulate when + where + why we truly need "elapsed time" calculations might help!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't understand why "elapsed time" calculations became the central point of this discussion. This PR does only one thing, convert naive datetime objects into timezone-aware datetime objects, nothing else. The "elapsed time" calculations were always there and we can see their usefulness in the "tasks" page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deeper Question: what should be the goal of this PR?
If the answer is safer time handling, that might be achieved in ~5 main ways:
- Disregard the local machine's TZ setting (time zone setting can and commonly does change during IIAB provisioning!)
- Disregard twice-annual daylight savings time changes (that occur in many time zones!)
- Disregard machine time changes triggered by NTP
- Disregard machine time changes triggered by nearby mobile devices whose time (according to their browser headers) is > 2min ahead of IIAB machine's time
- Disregard machine time changes triggered by sysadmin manual intervention
RECAP: So far this PR transitions to using the local machine's best guess at UTC time — which helps solve 1. and 2.
(But debatably this PR might not be quite enough to address considerations 3., 4. and 5. ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick RECAP of @codewiz's suggestion from ~2 days ago:
Python 3.3's time.monotonic()
should at least be considered for the "Run Time" column of "Tasks" view — and any other situations where "elapsed time" (or similar) are important... ?
- time.monotonic() (PEP 418)
- Much like Linux CLI command:
cat /proc/uptime
- Much like Linux CLI command:
uptime
@@ -151,7 +151,7 @@ def _add_download_tasks_to_worker(self, requested_urls): | |||
def run(self, worker_thread): | |||
self.worker_thread = worker_thread | |||
log.info("Starting to fetch metadata for URL: %s", self.media_url) | |||
self.start_time = self.end_time = datetime.now() | |||
self.end_time = datetime.now(timezone.utc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this was also setting start_time. No longer needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deldesir take a look, Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I made this change to make the elapsed time start immediately when the task is called instead of when the worker thread is started. Not significant though, but more realistic IMHO.
@@ -23,7 +23,7 @@ def __init__(self, task_message, media_url, original_url, current_user_name): | |||
self.original_url = self._format_original_url(original_url) | |||
self.is_playlist = None | |||
self.current_user_name = current_user_name | |||
self.start_time = self.end_time = datetime.now() | |||
self.start_time = self.end_time = datetime.now(timezone.utc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already being done in super(...).__init__(...)
, the constructor of CalibreTask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deldesir do you agree? (Do revise if so!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the reason I made the previous change to consider the time set in the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the reason I made the previous change to consider the time set in the constructor.
@deldesir Plz clarify the exact line number to help me understand? Thx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -110,7 +110,7 @@ def _update_metadata(self, requested_urls): | |||
requested_urls = {url: requested_urls[url] for url in requested_urls.keys() if "shorts" not in url and url not in failed_urls} | |||
|
|||
def _calculate_views_per_day(self, requested_urls, conn): | |||
now = datetime.now() | |||
now = datetime.now(timezone.utc) | |||
for requested_url in requested_urls.keys(): | |||
try: | |||
view_count = conn.execute("SELECT view_count FROM media WHERE path = ?", (requested_url,)).fetchone()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 117, what's the type of the time_uploaded field in the database?
Could you point me at the code writing this field, to ensure it's actually a POSIX timestamp (aka UNIX epoch)?
Also, datestamp.utcfromtimestamp() creates a naive datestamp, so if this is really a UNIX epoch, we should write:
time_uploaded_raw = conn.execute("...").fetchone()[0]
time_uploaded = datetime.fromtimestamp(time_uploaded_raw, timezone.utc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the code writing time_uploaded
at line 186:
conn.execute("UPDATE playlists SET path = ? WHERE path = ?", (f"{self.media_url}×tamp={int(datetime.now().timestamp())}", self.media_url))
Since int(datetime.now().timestamp())
is the same of int(time.time())
, this can be simplified.
Note that most SQL databases have types like TIMESTAMP and DATETIME, but SQLite by convention uses integer POSIX timestamps. So we're already storing the correct number here, but we could directly use int(time.time())
instead of converting the system time to datetime and then back to seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -110,7 +110,7 @@ def _update_metadata(self, requested_urls): | |||
requested_urls = {url: requested_urls[url] for url in requested_urls.keys() if "shorts" not in url and url not in failed_urls} | |||
|
|||
def _calculate_views_per_day(self, requested_urls, conn): | |||
now = datetime.now() | |||
now = datetime.now(timezone.utc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advice of using time.monotonic()
does NOT apply here, because later time_uploaded is subtracted from it to compute a time difference.
Both time.monotonic()
and time.time()
are floats, but they are not based on t he same epoch:
>>> import time
>>> time.monotonic()
180162.46153954
>>> time.time()
1720317895.1978724
Thus, subtracting a monotonic clock from a system clock is a serious bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @codewiz for your explanations. All theses changes were not made to enhance the runtime computation. They were motivated by
calibre-web/cps/tasks/metadata_extract.py
Lines 66 to 70 in aab3a75
query = ("SELECT path, duration, live_status FROM media WHERE error IS NULL AND path LIKE 'http%' AND time_created > ?" | |
if "error" in self.columns | |
else "SELECT path, duration, live_status FROM media WHERE path LIKE 'http%' AND time_created > ?") | |
rows = conn.execute(query, (int(self.start_time.timestamp()),)).fetchall() | |
requested_urls = {} |
query = ("SELECT path, duration, live_status FROM media WHERE error IS NULL AND path LIKE 'http%' AND time_created > ?"
if "error" in self.columns
else "SELECT path, duration, live_status FROM media WHERE path LIKE 'http%' AND time_created > ?")
rows = conn.execute(query, (int(self.start_time.timestamp()),)).fetchall()
datetime.now(timezone.utc)
]datetime.now(timezone.utc)
] [should time.monotonic()
be used to approximate elapsed time?]
datetime.now(timezone.utc)
] [should time.monotonic()
be used to approximate elapsed time?]datetime.now(timezone.utc)
] [should Python 3.3's time.monotonic()
be used to approximate elapsed time?]
@deldesir FYI merge conflict as a result of the just-merged: |
Re: 6c9e200 (https://github.com/deldesir/calibre-web/blob/6c9e2000340aac615ab394ea12bb4a4877f156d6/cps/tasks/download.py#L56-L57) @deldesir isn't a main point of this PR to change line 56 from... ?
...to...
|
Yes, I'll make the change. |
@deldesir please confirm my resolving of the 2 merge conflicts in (Arising from...) |
Your resolving of the 2 merge conflicts in cps/tasks/download.py (2a12fbc#diff-8ae9358844e7383eaeb354c71c7eb20f7010536f7bad834364d731d93154f79a) is on point. |
This PR addresses modified the runtime property in CalibreTask class to ensure datetime objects are timezone-aware. It also updates
start_time
andend_time
from the metadata fetch and download tasks accordingly.Tested on Ubuntu 24.04 with python 3.12 (LRN2)