Videos do not download (or do not show up properly in UI) when CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE set to -1 #778

bverkron · 2024-12-28T01:51:58Z

Describe the Bug

When CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE is set to -1 the videos downloaded from youtube (for example this one) don't work. I get the following instead of a playable video...

Screen.Recording.2024-12-27.at.5.48.30.PM.mov

When setting CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE back to another value like 1000 it works. Default value (i.e. not including the env var) also works.

Steps to Reproduce

Set CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE to -1
Add url https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg to Hoarder
Attempt to view the video after it's downloaded / processed

Expected Behaviour

Video is playable

Screenshots or Additional Context

Log entries from successful download (CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE set to 1000)

2024-12-28T01:39:12.708Z info: [Crawler][45] Will crawl "https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg" for link with id "totdckcx63gx6xnlwtcbiozi"
2024-12-28T01:39:12.709Z info: [Crawler][45] Attempting to determine the content-type for the url https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg
2024-12-28T01:39:12.772Z info: [search][46] Attempting to index bookmark with id totdckcx63gx6xnlwtcbiozi ...
2024-12-28T01:39:12.922Z info: [search][46] Completed successfully
2024-12-28T01:39:12.992Z info: [Crawler][45] Content-type for the url https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg is "text/html; charset=utf-8"
2024-12-28T01:39:16.053Z info: [Crawler][45] Successfully navigated to "https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg". Waiting for the page to load ...
2024-12-28T01:39:21.057Z info: [Crawler][45] Finished waiting for the page to load.
2024-12-28T01:39:21.265Z info: [Crawler][45] Successfully fetched the page content.
2024-12-28T01:39:21.609Z info: [Crawler][45] Finished capturing page content and a screenshot. FullPageScreenshot: false
2024-12-28T01:39:21.619Z info: [Crawler][45] Will attempt to extract metadata from page ...
2024-12-28T01:39:26.436Z info: [Crawler][45] Will attempt to extract readable content ...
2024-12-28T01:39:29.317Z info: [Crawler][45] Done extracting readable content.
2024-12-28T01:39:29.378Z info: [Crawler][45] Stored the screenshot as assetId: 0c0b7315-29d1-488c-ab33-602b9eefd7d5
2024-12-28T01:39:29.436Z info: [Crawler][45] Done extracting metadata from the page.
2024-12-28T01:39:29.437Z info: [Crawler][45] Downloading image from "https://i.ytimg.com/vi/Lw9Y_A5rzOs/maxres2.jpg?sqp=-oaymwEoCIAKENAF8quKqQMcGADwAQH4AbYIgAKAD4oCDAgAEAEYciBLKEAwDw==&rs=AOn4CLBYL4uSqtx5DMs9e-sE5MbFW6XmtA"
2024-12-28T01:39:29.521Z info: [Crawler][45] Downloaded image as assetId: 34d0f0f2-4bfb-478b-9578-1865b673eb09
2024-12-28T01:39:29.602Z info: [Crawler][45] Completed successfully
2024-12-28T01:39:30.415Z debug: [inference][47] No inference client configured, nothing to do now
2024-12-28T01:39:30.416Z info: [inference][47] Completed successfully
2024-12-28T01:39:30.470Z info: [search][48] Attempting to index bookmark with id totdckcx63gx6xnlwtcbiozi ...
2024-12-28T01:39:30.482Z info: [VideoCrawler][49] Attempting to download a file from "https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg" to "/tmp/video_downloads/bdbaf00b-9b02-4fa4-9369-8e4e632f7c9d" using the following arguments: "https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg,-f,best[filesize<1000M],-o,/tmp/video_downloads/bdbaf00b-9b02-4fa4-9369-8e4e632f7c9d,--no-playlist"
2024-12-28T01:39:30.574Z info: [search][48] Completed successfully
2024-12-28T01:39:35.136Z info: [VideoCrawler][49] Finished downloading a file from "https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg" to "/tmp/video_downloads/bdbaf00b-9b02-4fa4-9369-8e4e632f7c9d"
2024-12-28T01:39:35.177Z info: [VideoCrawler][49] Finished downloading video from "https://youtu.be/Lw9Y_A5rzOs?si=tDY6iGdnSK_pm4vg" and adding it to the database
2024-12-28T01:39:35.178Z info: [VideoCrawler][49] Video Download Completed successfully

Log when set to -1

2024-12-28T01:44:48.903Z info: [Crawler][51] Will crawl "https://youtu.be/Lw9Y_A5rzOs?si=m1mYS19NUmXzkexr" for link with id "uskue5v4bdwpl8jzgbmcfh64"
2024-12-28T01:44:48.905Z info: [Crawler][51] Attempting to determine the content-type for the url https://youtu.be/Lw9Y_A5rzOs?si=m1mYS19NUmXzkexr
2024-12-28T01:44:49.071Z info: [search][52] Attempting to index bookmark with id uskue5v4bdwpl8jzgbmcfh64 ...
2024-12-28T01:44:49.143Z info: [Crawler][51] Content-type for the url https://youtu.be/Lw9Y_A5rzOs?si=m1mYS19NUmXzkexr is "text/html; charset=utf-8"
2024-12-28T01:44:49.151Z info: [search][52] Completed successfully
2024-12-28T01:44:51.860Z info: [Crawler][51] Successfully navigated to "https://youtu.be/Lw9Y_A5rzOs?si=m1mYS19NUmXzkexr". Waiting for the page to load ...
2024-12-28T01:44:56.861Z info: [Crawler][51] Finished waiting for the page to load.
2024-12-28T01:44:57.093Z info: [Crawler][51] Successfully fetched the page content.
2024-12-28T01:44:57.390Z info: [Crawler][51] Finished capturing page content and a screenshot. FullPageScreenshot: false
2024-12-28T01:44:57.403Z info: [Crawler][51] Will attempt to extract metadata from page ...
2024-12-28T01:45:02.529Z info: [Crawler][51] Will attempt to extract readable content ...
2024-12-28T01:45:05.685Z info: [Crawler][51] Done extracting readable content.
2024-12-28T01:45:05.745Z info: [Crawler][51] Stored the screenshot as assetId: cb93da1a-00e6-438c-af44-db72f37456a1
2024-12-28T01:45:05.789Z info: [Crawler][51] Done extracting metadata from the page.
2024-12-28T01:45:05.789Z info: [Crawler][51] Downloading image from "https://i.ytimg.com/vi/Lw9Y_A5rzOs/maxres2.jpg?sqp=-oaymwEoCIAKENAF8quKqQMcGADwAQH4AbYIgAKAD4oCDAgAEAEYciBLKEAwDw==&rs=AOn4CLBYL4uSqtx5DMs9e-sE5MbFW6XmtA"
2024-12-28T01:45:05.857Z info: [Crawler][51] Downloaded image as assetId: ccd5259b-49c8-4afc-8303-76078e2ca57d
2024-12-28T01:45:05.927Z info: [Crawler][51] Completed successfully
2024-12-28T01:45:06.777Z debug: [inference][53] No inference client configured, nothing to do now
2024-12-28T01:45:06.778Z info: [inference][53] Completed successfully
2024-12-28T01:45:06.834Z info: [search][54] Attempting to index bookmark with id uskue5v4bdwpl8jzgbmcfh64 ...
2024-12-28T01:45:06.848Z info: [VideoCrawler][55] Attempting to download a file from "https://youtu.be/Lw9Y_A5rzOs?si=m1mYS19NUmXzkexr" to "/tmp/video_downloads/454d5edb-f75d-4b56-8203-ad40613563b8" using the following arguments: "https://youtu.be/Lw9Y_A5rzOs?si=m1mYS19NUmXzkexr,-o,/tmp/video_downloads/454d5edb-f75d-4b56-8203-ad40613563b8,--no-playlist"
2024-12-28T01:45:06.937Z info: [search][54] Completed successfully

Device Details

Safari 17.6 on macOS

Exact Hoarder Version

v0.20.0

The text was updated successfully, but these errors were encountered:

bverkron · 2024-12-28T01:54:40Z

Workaround is setting it to a very high number that will likely never be reached, like 9999999999999. Effectively the same as having no limit.

kamtschatka · 2024-12-29T21:09:00Z

works fine for me, have you tried other browsers to see if maybe Safari does not support the video format?

bverkron · 2024-12-29T21:48:17Z

It appears to work in Brave (i.e. Chrome) but Hoarder must be doing something different with the video (aside from the compression I assume) when set to -1 since using any other value besides -1 (even an arbitrarily large value like 9999999999999) makes it playable in Safari.

kamtschatka · 2024-12-29T23:01:21Z

hoarder merely passes this parameter to yt-dlp, which then chooses which file to download.
When -1 is provided, we skip adding the filesize filter, otherwise we add best[filesize<${maxVideoDownloadSize}M] to the arguments.
So seems like yt-dlp simply chooses a different version of the video then and Safari really has some issues with some video formats.

bverkron · 2024-12-30T04:27:22Z

Looks like with -1 set it's downloading it AV1 format. Perhaps that's the original Youtube is serving up and with any other value passed it's converting to MP4. In either case AV1 is relatively new and not widely supported yet which probably explains playback problems in Safari.

Video format with CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE = -1

Video format with CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE = 9999999999999

Video format with CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE not set, I think. Same result as 9999999999999

This kind of issue may be solved by proxy if #775 were to implement different options to control the resolution, format, etc. Hoarder could be set to give consistent formats so these kinds of inconsistencies are avoided.

kamtschatka · 2024-12-30T13:52:25Z

Yeah, i don't think it makes sense to track this separately and should be fixed as part of #775.
Safari truly is the new Internet Explorer of the Internet...

bverkron · 2024-12-30T19:26:50Z

Closing in favour of #775

bverkron closed this as completed Dec 30, 2024

MohamedBassem mentioned this issue Dec 31, 2024

Firefox archived video unsupported format and MIME type found #792

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Videos do not download (or do not show up properly in UI) when CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE set to -1 #778

Videos do not download (or do not show up properly in UI) when CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE set to -1 #778

bverkron commented Dec 28, 2024

bverkron commented Dec 28, 2024

kamtschatka commented Dec 29, 2024

bverkron commented Dec 29, 2024

kamtschatka commented Dec 29, 2024

bverkron commented Dec 30, 2024

kamtschatka commented Dec 30, 2024

bverkron commented Dec 30, 2024

Videos do not download (or do not show up properly in UI) when CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE set to -1 #778

Videos do not download (or do not show up properly in UI) when CRAWLER_VIDEO_DOWNLOAD_MAX_SIZE set to -1 #778

Comments

bverkron commented Dec 28, 2024

Describe the Bug

Steps to Reproduce

Expected Behaviour

Screenshots or Additional Context

Device Details

Exact Hoarder Version

bverkron commented Dec 28, 2024

kamtschatka commented Dec 29, 2024

bverkron commented Dec 29, 2024

kamtschatka commented Dec 29, 2024

bverkron commented Dec 30, 2024

kamtschatka commented Dec 30, 2024

bverkron commented Dec 30, 2024