Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove residual [duplicate] YouTube Shorts urls from database #155

Merged
merged 3 commits into from
May 31, 2024

Conversation

deldesir
Copy link
Collaborator

@deldesir deldesir commented Apr 16, 2024

This PR enables downloading of channels that contains Shorts. It does 2 things:
1- Remove shorts duplicates from database
2- Ignore short videos when downloading playlists and channels

Fixes #154

Tested with https://www.youtube.com/@AutoApps. MAX_VIDEOS_PER_DOWNLOAD was set to 530 to account for the maximum videos downloads since the channels contains 523 videos. 415/415 were successfully processed for metadata, ignoring all shorts.

1- Remove shorts duplicates from database
2- Ignore short videos when downloading playlists and channels
@deldesir deldesir self-assigned this Apr 16, 2024
@deldesir deldesir requested a review from holta April 16, 2024 18:20
@holta
Copy link
Member

holta commented Apr 16, 2024

2- Ignore short videos when downloading playlists and channels

Quick questions:

  • YouTube "Shorts" will not be downloaded?

  • Explain why this is advisable or necessary?

  • Is this a temporary measure?

(Thanks for making the context/motivation more clear!)

Comment on lines 106 to 109

def _ignore_shorts(self, requested_urls):
requested_urls = {url: requested_urls[url] for url in requested_urls.keys() if requested_urls[url]["duration"] > 60}

Copy link
Collaborator Author

@deldesir deldesir Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • YouTube "Shorts" will not be downloaded?

Exactly. The function _ignore_shorts defaults playlists and channels downloads to ignoring Shorts

  • Explain why this is advisable or necessary?

It's there to support it as an optional feature. Like constants MAX_VIDEOS_PER_DOWNLOAD and MAX_GB_PER_DOWNLOAD, we can make it configurable for the user to toggle it on/off.

  • Is this a temporary measure?

It doesnt affect the downloading per se. However the other function introduced (_remove_shorts_from_db) is a temporary measure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like constants MAX_VIDEOS_PER_DOWNLOAD

Should the option to ignore "Shorts" be part of this PR, if it's straightforward to implement safely?

Copy link
Collaborator Author

@deldesir deldesir Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving it into another PR might make it more feature-relevant than portraying it as a fix. Zero confusion is better.

Copy link
Member

@holta holta May 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving it into another PR might make it more feature-relevant than portraying it as a fix. Zero confusion is better.

Please link to any PR or ticket explaining what should happen next:

How should people who want "YouTube Shorts" included when downloading channels/playlists/etc... set that option at the command-line... prior to clicking "Download to IIAB" ?

Comment on lines 58 to +62

def _remove_shorts_from_db(self, conn):
conn.execute("DELETE FROM media WHERE path LIKE '%shorts%'")
conn.commit()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is a temporary solution that deletes all entries with paths that contain "shorts". It fixes #154

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is a temporary solution that deletes all entries with paths that contain "shorts". It fixes #154

"temporary solution" means what?

(When should this PR be reverted-or-improved upon? Outline concrete next steps?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means this should be fixed in xklb. I need to reproduce the issue about short URLs duplicates first. Meanwhile it's safe to implement this temporary fix because it removes those URLs.

Copy link
Member

@holta holta May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this PR remove duplicate "YouTube Shorts" from xklb-metadata.db ?

(And if so, a question: does the removing happen in a quite safe or completely safe way — such that everything will continue to work in future, e.g. if xklb itself later removes duplicates?)

Copy link
Collaborator Author

@deldesir deldesir May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's safe enough. It will continue to work regardless of xklb implementing it or not. However, it's worth noting that if there are no "shorts" URLs in the database, the function will still execute a query to delete them, which might be unnecessary overhead. I think adding a condition to check for their existence first might enhance it.

Suggested change
def _remove_shorts_from_db(self, conn):
conn.execute("DELETE FROM media WHERE path LIKE '%shorts%'")
conn.commit()
def _remove_shorts_from_db(self, conn):
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM media WHERE path LIKE '%shorts%'")
count = cursor.fetchone()[0]
if count > 0:
conn.execute("DELETE FROM media WHERE path LIKE '%shorts%'")
conn.commit()

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's tested very safe, and ideally simplest?

(Make a recommendation!)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code is the simplest and very safe. I recommend it.

@deldesir deldesir changed the title Handle YouTube Shorts Remove residual Shorts urls fro database Apr 16, 2024
@deldesir deldesir changed the title Remove residual Shorts urls fro database Remove residual Shorts urls from database Apr 16, 2024
@holta
Copy link
Member

holta commented Apr 16, 2024

Should @EMG70 test this PR ?

(And if so how!)

@deldesir
Copy link
Collaborator Author

Should @EMG70 test this PR ?

Yes. By downloading a whole channel.

@holta holta added the bug Something isn't working label May 2, 2024
@holta
Copy link
Member

holta commented May 2, 2024

Should @EMG70 test this PR ?

Yes. By downloading a whole channel.

@deldesir please test this and let us know that everything's solid — if @EMG70 (contributing in many other ways!) has no time here? ✅

@deldesir
Copy link
Collaborator Author

deldesir commented May 2, 2024

Okay. I will give some additional feedback on it.

@holta holta changed the title Remove residual Shorts urls from database Remove residual [duplicate] YouTube Shorts urls from database May 30, 2024
@holta
Copy link
Member

holta commented May 31, 2024

This PR was smoke-tested with #151 and #157 on t4 as explained here:

#157 (comment)

(Further functionality testing needed to fully confirm...)

@holta holta merged commit 478fd22 into iiab:master May 31, 2024
@deldesir deldesir deleted the deldesir-patch-27 branch July 1, 2024 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Channel downloads fail due to residual shorts duplicates
2 participants