-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent stuck videos from carrying over to next download session ["unavailable fragments" ?] #193
Conversation
@codewiz can you review? |
@deldesir can you recommend any YouTube URL (video or playlist) known to regularly pollute (So we can all use this test case URL to confirm improved behavior!) |
Pollution and "carrying over" are 2 different things. The former happens when a downloading session is cancelled prematurely. For example, if you click on "cancel" button in the video row (Tasks page) or if you restart Calibre-Web while a playlist download is in progress. The latter happens in the case of failures due to unavailable fragments or because the video is/was a live. To answer your question, just restart Calibre-Web or click on cancel before the download task kicks in. |
else: | ||
log.error("No error found in the database, likely the video failed due to unavailable fragments.") | ||
self.message = f"{self.media_url_link} failed to download: No path found in the database" | ||
media_id = conn.execute("SELECT id FROM media WHERE webpath = ?", (self.media_url,)).fetchone()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is one record with this webpath guaranteed to exist in the media table, even when the query at line 89 returned no records?
Since the query above has the extra condition ... AND path NOT LIKE 'http%'
, it's possible that this less restrictive query might return one result (or more?).
If the query fails, trying to access field [0]
will throw, and you may need to add an extra check:
results = conn.execute("SELECT id FROM media WHERE webpath = ?", (self.media_url,)).fetchone()
if results:
media_id = results[0]
conn.execute("DELETE ...")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand your point well. I am confident one record with this webpath
will always be present because it's in fact "the" trigger of the actual download task. It's mandatory. However the condition AND path NOT LIKE 'http%'
might not be met under certain circumstances, i.e when the video fails to have it path
updated with a path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed how the media table is created and updated by xklb, and it seems that a record with webpath = media_url
will be inserted by the lb dl
child process at some point...
Could there be cases where the child process fails before updating the database?
If it's possible (even if rare), then this query will return 0 records, and this code should defensively check before trying to access the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The running of the child process alone attests the existence of webpath = media_url
. Yes there are cases where lb dl
can fail, for example live videos, videos without an available/suitable format, videos stuck due to unavailable fragments or network issues. But it will never return 0 record because it's "metadata fetch" that creates the record exists and ensures this specific record is used per
calibre-web/scripts/lb-wrapper
Line 84 in b854132
xklb_full_cmd="lb dl ${XKLB_DB_FILE} --video --search ${URL_OR_SEARCH_TERM} ${FORMAT_OPTIONS} --write-thumbnail --subs -o ${OUTTMPL} ${VERBOSITY}" |
cps/tasks/download.py
Outdated
@@ -111,6 +118,7 @@ def run(self, worker_thread): | |||
# 2024-02-17: Dedup Design Evolving... https://github.com/iiab/calibre-web/pull/125 | |||
conn.execute("UPDATE media SET path = ? WHERE webpath = ?", (new_video_path, self.media_url)) | |||
conn.execute("UPDATE media SET webpath = ? WHERE path = ?", (f"{self.media_url}×tamp={int(datetime.now().timestamp())}", new_video_path)) | |||
conn.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified this too: commit() is required if the sqlite3 connection is in autocommit=False mode, which is the recommended value.
But it seems that autocommit =False is not the current default:
https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.autocommit
No change required here, but it's probably best to set autocommit=False on all sqlite connections? Otherwise, a crash happening between these two UPDATE statements would leave the database in an inconsistent state.
I also noticed that the conn.close()
at line 127 can be removed:
with sqlite3.connect(XKLB_DB_FILE) as conn:
...transactions done here...
# conn is implicitly closed when leaving the with block
conn.close() # this additional close has no effect and can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No change required here, but it's probably best to set autocommit=False on all sqlite connections?
FYI, sqlite3.connect(... autocommit =False)
requires Python 3.12, so it's probably best to avoid it while there are still users on older Python versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading the docs on how sqlite3.Connection objects interact with context managers, I realize that it doesn't work the same of file.open()
and other simple I/O objects.
with sqlite3.connect(XKLB_DB_FILE) as conn:
# ^-- the above context manager will implicitly start an SQL transaction
...queries executed here are part of the transaction...
# The transaction is committed, equivalent to `conn.commit()`
conn.close() # Still needed because the with block leaves the connection open
See: https://docs.python.org/3/library/sqlite3.html#how-to-use-the-connection-context-manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words: you don't need to add conn.commit()
when you're already inside a with block and you will exit the block cleanly (e.g. without throwing an exception or crashing).
All review issues addressed, merge at your convenience. |
Let's try to add a bit more "Defensive Programming" to download.py before merging:
|
Co-authored-by: A Holt <[email protected]>
cps/tasks/download.py
Outdated
@@ -95,6 +95,12 @@ def run(self, worker_thread): | |||
if error: | |||
log.error("[xklb] An error occurred while trying to download %s: %s", error[1], error[0]) | |||
self.message = f"{error[1]} failed to download: {error[0]}" | |||
else: | |||
log.error("No error found in the database, likely the video failed due to unavailable fragments.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal should be very complete diagnostic hints in both... logs and in "Tasks" view error messages:
@deldesir: what log.error message is best?
log.error("No error found in the database, likely the video failed due to unavailable fragments.") | |
log.error("%s failed to download: No path or error found in the database (likely the video failed due to unavailable fragments?)", self.media_url_link) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best, but the use of self.media_url_link here is not appropriate in a log message. Use self.media_url instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, go ahead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @deldesir: please enact self.media_url
if that's best, and test!
@deldesir: Tested as safe enough to merge? |
Video failed due to unavailable fragments are not kept in the database. Those failed for other reasons are left but will not carry over the next download session.
Tested on Ubuntu 24.04 with #194