-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove residual [duplicate] YouTube Shorts urls from database #155
Conversation
1- Remove shorts duplicates from database 2- Ignore short videos when downloading playlists and channels
Quick questions:
(Thanks for making the context/motivation more clear!) |
cps/tasks/metadata_extract.py
Outdated
|
||
def _ignore_shorts(self, requested_urls): | ||
requested_urls = {url: requested_urls[url] for url in requested_urls.keys() if requested_urls[url]["duration"] > 60} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- YouTube "Shorts" will not be downloaded?
Exactly. The function _ignore_shorts
defaults playlists and channels downloads to ignoring Shorts
- Explain why this is advisable or necessary?
It's there to support it as an optional feature. Like constants MAX_VIDEOS_PER_DOWNLOAD
and MAX_GB_PER_DOWNLOAD
, we can make it configurable for the user to toggle it on/off.
- Is this a temporary measure?
It doesnt affect the downloading per se. However the other function introduced (_remove_shorts_from_db
) is a temporary measure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like constants MAX_VIDEOS_PER_DOWNLOAD
Should the option to ignore "Shorts" be part of this PR, if it's straightforward to implement safely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving it into another PR might make it more feature-relevant than portraying it as a fix. Zero confusion is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving it into another PR might make it more feature-relevant than portraying it as a fix. Zero confusion is better.
Please link to any PR or ticket explaining what should happen next:
How should people who want "YouTube Shorts" included when downloading channels/playlists/etc... set that option at the command-line... prior to clicking "Download to IIAB" ?
|
||
def _remove_shorts_from_db(self, conn): | ||
conn.execute("DELETE FROM media WHERE path LIKE '%shorts%'") | ||
conn.commit() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is a temporary solution that deletes all entries with paths that contain "shorts". It fixes #154
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is a temporary solution that deletes all entries with paths that contain "shorts". It fixes #154
"temporary solution" means what?
(When should this PR be reverted-or-improved upon? Outline concrete next steps?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means this should be fixed in xklb. I need to reproduce the issue about short URLs duplicates first. Meanwhile it's safe to implement this temporary fix because it removes those URLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this PR remove duplicate "YouTube Shorts" from xklb-metadata.db ?
(And if so, a question: does the removing happen in a quite safe or completely safe way — such that everything will continue to work in future, e.g. if xklb itself later removes duplicates?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's safe enough. It will continue to work regardless of xklb implementing it or not. However, it's worth noting that if there are no "shorts" URLs in the database, the function will still execute a query to delete them, which might be unnecessary overhead. I think adding a condition to check for their existence first might enhance it.
def _remove_shorts_from_db(self, conn): | |
conn.execute("DELETE FROM media WHERE path LIKE '%shorts%'") | |
conn.commit() | |
def _remove_shorts_from_db(self, conn): | |
cursor = conn.cursor() | |
cursor.execute("SELECT COUNT(*) FROM media WHERE path LIKE '%shorts%'") | |
count = cursor.fetchone()[0] | |
if count > 0: | |
conn.execute("DELETE FROM media WHERE path LIKE '%shorts%'") | |
conn.commit() |
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's tested very safe, and ideally simplest?
(Make a recommendation!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code is the simplest and very safe. I recommend it.
Should @EMG70 test this PR ? (And if so how!) |
Yes. By downloading a whole channel. |
Okay. I will give some additional feedback on it. |
This PR enables downloading of channels that contains Shorts. It does 2 things:
1- Remove shorts duplicates from database
2- Ignore short videos when downloading playlists and channels
Fixes #154
Tested with https://www.youtube.com/@AutoApps. MAX_VIDEOS_PER_DOWNLOAD was set to 530 to account for the maximum videos downloads since the channels contains 523 videos. 415/415 were successfully processed for metadata, ignoring all shorts.