Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pages with a Youtube embeded player are not playing the whole video anymore #181

Closed
benoit74 opened this issue Jun 21, 2024 · 3 comments
Closed
Assignees

Comments

@benoit74
Copy link

It looks like Youtube player has been significantly modified and WARC of a page with an embedded youtube video seems to not work anymore.

For instance https://tmp.kiwix.org/ci/test-warc/100r.co/crawl-100r-orca-20240528.warc.gz is working properly on replayweb.page but https://tmp.kiwix.org/ci/test-warc/100r.co/crawl-orca-20240620.warc.gz is playing only the first 4 secs of the video then failing.

I've already assembled quite a lot of details and investigation in openzim/zimit#323

In few words, it looks like the player is not doing anymore Range Requests to grab the video, but multiple regular GET request with the range specified in a query parameter. Unfortunately the range is highly dynamic based on "I don't know which environmental factor", so replayers fails to find proper record when replaying, making adaptation of fuzzy rules insufficient to fix the problem.

@ikreymer
Copy link
Member

Yes, unfortunately, the whole player has been changed, so previous rewriting injections no longer work at all. Will have a fix shortly, which probably most fail-safe for now is to disable MediaSource based playback, which allows the player to fallback to mp4 streaming. The fix being tested is injecting into youtube.com HTML pages:

<script>window.MediaSource.isTypeSupported = () => false;</script>

@benoit74
Copy link
Author

OK, thank you very much for the hint, will try this as well. What you say makes sense but far from what I could I found on my own! So double thank you ^^

ikreymer added a commit that referenced this issue Jun 21, 2024
- Add a new rewriter type with HTML-only rewrite rules.
- Export HTML Rx rewriter, to be used at capture time. 
- for now, only to be used externally (in Browsertrix Crawler and ArchiveWeb.page). Necessary to fix youtube capture & replay issues.

Addresses #181
@ikreymer
Copy link
Member

Fixed via #182

@ikreymer ikreymer self-assigned this Jun 26, 2024
@ikreymer ikreymer moved this from Triage to Done! in Webrecorder Projects Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done!
Development

No branches or pull requests

2 participants