Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Videos do not download (or do not show up properly in UI) when CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE set to -1 #778

Closed
bverkron opened this issue Dec 28, 2024 · 7 comments

Comments

@bverkron
Copy link

Describe the Bug

When CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE is set to -1 the videos downloaded from youtube (for example this one) don't work. I get the following instead of a playable video...

Screen.Recording.2024-12-27.at.5.48.30.PM.mov

When setting CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE back to another value like 1000 it works. Default value (i.e. not including the env var) also works.

Steps to Reproduce

  1. Set CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE to -1
  2. Add url https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg to Hoarder
  3. Attempt to view the video after it's downloaded / processed

Expected Behaviour

Video is playable

Screenshots or Additional Context

Log entries from successful download (CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE set to 1000)

2024-12-28T01:39:12.708Z info: [Crawler][45] Will crawl "https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg" for link with id "totdckcx63gx6xnlwtcbiozi"
2024-12-28T01:39:12.709Z info: [Crawler][45] Attempting to determine the content-type for the url https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg
2024-12-28T01:39:12.772Z info: [search][46] Attempting to index bookmark with id totdckcx63gx6xnlwtcbiozi ...
2024-12-28T01:39:12.922Z info: [search][46] Completed successfully
2024-12-28T01:39:12.992Z info: [Crawler][45] Content-type for the url https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg is "text/html; charset=utf-8"
2024-12-28T01:39:16.053Z info: [Crawler][45] Successfully navigated to "https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg". Waiting for the page to load ...
2024-12-28T01:39:21.057Z info: [Crawler][45] Finished waiting for the page to load.
2024-12-28T01:39:21.265Z info: [Crawler][45] Successfully fetched the page content.
2024-12-28T01:39:21.609Z info: [Crawler][45] Finished capturing page content and a screenshot. FullPageScreenshot: false
2024-12-28T01:39:21.619Z info: [Crawler][45] Will attempt to extract metadata from page ...
2024-12-28T01:39:26.436Z info: [Crawler][45] Will attempt to extract readable content ...
2024-12-28T01:39:29.317Z info: [Crawler][45] Done extracting readable content.
2024-12-28T01:39:29.378Z info: [Crawler][45] Stored the screenshot as assetId: 0c0b7315-29d1-488c-ab33-602b9eefd7d5
2024-12-28T01:39:29.436Z info: [Crawler][45] Done extracting metadata from the page.
2024-12-28T01:39:29.437Z info: [Crawler][45] Downloading image from "https://i.ytimg.com/vi/Lw9Y_A5rzOs/maxres2.jpg?sqp=-oaymwEoCIAKENAF8quKqQMcGADwAQH4AbYIgAKAD4oCDAgAEAEYciBLKEAwDw==&rs=AOn4CLBYL4uSqtx5DMs9e-sE5MbFW6XmtA"
2024-12-28T01:39:29.521Z info: [Crawler][45] Downloaded image as assetId: 34d0f0f2-4bfb-478b-9578-1865b673eb09
2024-12-28T01:39:29.602Z info: [Crawler][45] Completed successfully
2024-12-28T01:39:30.415Z debug: [inference][47] No inference client configured, nothing to do now
2024-12-28T01:39:30.416Z info: [inference][47] Completed successfully
2024-12-28T01:39:30.470Z info: [search][48] Attempting to index bookmark with id totdckcx63gx6xnlwtcbiozi ...
2024-12-28T01:39:30.482Z info: [VideoCrawler][49] Attempting to download a file from "https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg" to "/tmp/video_downloads/bdbaf00b-9b02-4fa4-9369-8e4e632f7c9d" using the following arguments: "https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg,-f,best[filesize<1000M],-o,/tmp/video_downloads/bdbaf00b-9b02-4fa4-9369-8e4e632f7c9d,--no-playlist"
2024-12-28T01:39:30.574Z info: [search][48] Completed successfully
2024-12-28T01:39:35.136Z info: [VideoCrawler][49] Finished downloading a file from "https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg" to "/tmp/video_downloads/bdbaf00b-9b02-4fa4-9369-8e4e632f7c9d"
2024-12-28T01:39:35.177Z info: [VideoCrawler][49] Finished downloading video from "https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg" and adding it to the database
2024-12-28T01:39:35.178Z info: [VideoCrawler][49] Video Download Completed successfully

Log when set to -1

2024-12-28T01:44:48.903Z info: [Crawler][51] Will crawl "https://youtu.be/Lw9Y_A5rzOs?si=m1mYS19NUmXzkexr" for link with id "uskue5v4bdwpl8jzgbmcfh64"
2024-12-28T01:44:48.905Z info: [Crawler][51] Attempting to determine the content-type for the url https://youtu.be/Lw9Y_A5rzOs?si=m1mYS19NUmXzkexr
2024-12-28T01:44:49.071Z info: [search][52] Attempting to index bookmark with id uskue5v4bdwpl8jzgbmcfh64 ...
2024-12-28T01:44:49.143Z info: [Crawler][51] Content-type for the url https://youtu.be/Lw9Y_A5rzOs?si=m1mYS19NUmXzkexr is "text/html; charset=utf-8"
2024-12-28T01:44:49.151Z info: [search][52] Completed successfully
2024-12-28T01:44:51.860Z info: [Crawler][51] Successfully navigated to "https://youtu.be/Lw9Y_A5rzOs?si=m1mYS19NUmXzkexr". Waiting for the page to load ...
2024-12-28T01:44:56.861Z info: [Crawler][51] Finished waiting for the page to load.
2024-12-28T01:44:57.093Z info: [Crawler][51] Successfully fetched the page content.
2024-12-28T01:44:57.390Z info: [Crawler][51] Finished capturing page content and a screenshot. FullPageScreenshot: false
2024-12-28T01:44:57.403Z info: [Crawler][51] Will attempt to extract metadata from page ...
2024-12-28T01:45:02.529Z info: [Crawler][51] Will attempt to extract readable content ...
2024-12-28T01:45:05.685Z info: [Crawler][51] Done extracting readable content.
2024-12-28T01:45:05.745Z info: [Crawler][51] Stored the screenshot as assetId: cb93da1a-00e6-438c-af44-db72f37456a1
2024-12-28T01:45:05.789Z info: [Crawler][51] Done extracting metadata from the page.
2024-12-28T01:45:05.789Z info: [Crawler][51] Downloading image from "https://i.ytimg.com/vi/Lw9Y_A5rzOs/maxres2.jpg?sqp=-oaymwEoCIAKENAF8quKqQMcGADwAQH4AbYIgAKAD4oCDAgAEAEYciBLKEAwDw==&rs=AOn4CLBYL4uSqtx5DMs9e-sE5MbFW6XmtA"
2024-12-28T01:45:05.857Z info: [Crawler][51] Downloaded image as assetId: ccd5259b-49c8-4afc-8303-76078e2ca57d
2024-12-28T01:45:05.927Z info: [Crawler][51] Completed successfully
2024-12-28T01:45:06.777Z debug: [inference][53] No inference client configured, nothing to do now
2024-12-28T01:45:06.778Z info: [inference][53] Completed successfully
2024-12-28T01:45:06.834Z info: [search][54] Attempting to index bookmark with id uskue5v4bdwpl8jzgbmcfh64 ...
2024-12-28T01:45:06.848Z info: [VideoCrawler][55] Attempting to download a file from "https://youtu.be/Lw9Y_A5rzOs?si=m1mYS19NUmXzkexr" to "/tmp/video_downloads/454d5edb-f75d-4b56-8203-ad40613563b8" using the following arguments: "https://youtu.be/Lw9Y_A5rzOs?si=m1mYS19NUmXzkexr,-o,/tmp/video_downloads/454d5edb-f75d-4b56-8203-ad40613563b8,--no-playlist"
2024-12-28T01:45:06.937Z info: [search][54] Completed successfully

Device Details

Safari 17.6 on macOS

Exact Hoarder Version

v0.20.0

@bverkron
Copy link
Author

Workaround is setting it to a very high number that will likely never be reached, like 9999999999999. Effectively the same as having no limit.

@kamtschatka
Copy link
Contributor

works fine for me, have you tried other browsers to see if maybe Safari does not support the video format?

@bverkron
Copy link
Author

It appears to work in Brave (i.e. Chrome) but Hoarder must be doing something different with the video (aside from the compression I assume) when set to -1 since using any other value besides -1 (even an arbitrarily large value like 9999999999999) makes it playable in Safari.

@kamtschatka
Copy link
Contributor

hoarder merely passes this parameter to yt-dlp, which then chooses which file to download.
When -1 is provided, we skip adding the filesize filter, otherwise we add best[filesize<${maxVideoDownloadSize}M] to the arguments.
So seems like yt-dlp simply chooses a different version of the video then and Safari really has some issues with some video formats.

@bverkron
Copy link
Author

Looks like with -1 set it's downloading it AV1 format. Perhaps that's the original Youtube is serving up and with any other value passed it's converting to MP4. In either case AV1 is relatively new and not widely supported yet which probably explains playback problems in Safari.

Video format with CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE = -1
negative 1

Video format with CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE = 9999999999999
99999999999

Video format with CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE not set, I think. Same result as 9999999999999
50

This kind of issue may be solved by proxy if #775 were to implement different options to control the resolution, format, etc. Hoarder could be set to give consistent formats so these kinds of inconsistencies are avoided.

@kamtschatka
Copy link
Contributor

Yeah, i don't think it makes sense to track this separately and should be fixed as part of #775.
Safari truly is the new Internet Explorer of the Internet...

@bverkron
Copy link
Author

Closing in favour of #775

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants