[Bug]: Using browser profiles results in crawl failures #1364

tw4l · 2023-11-09T22:45:27Z

Browsertrix Cloud Version

v1.8.0-beta.2-3aebf2e

What did you expect to happen? What happened instead?

I expect crawls with a custom browser profile to run successfully. Instead, the crawl fails with a cryptic S3 error in the crawler logs because the crawler is looking for the profile at <s3 endpoint>/<oid>/profile/<filename> but the file is actually at <s3 endpoint>/<oid>/<filename>.

Step-by-step reproduction instructions

Create a new browser profile
Attempt to run a crawl with that browser profile
Watch the crawl fail

Additional details

It seems the profile/ path prefix for profiles got dropped at some point in the storage refactoring work (PR #1296). To fix, we'll add the prefix back and move any files that need to be moved in the S3 buckets.

The text was updated successfully, but these errors were encountered:

Fixes #1364 Regression fix for issue introduced in storage refactoring (see issue for more details). Changes: 1. Add `profiles/` prefix to profile filename passed in to crawler for profile creation and written into db 2. Remove hardcoded `profiles/` prefix from crawler YAML 3. Add migration to add `profiles/` prefix to profile filenames that don't already have it, including updating PROFILE_FILENAME in ConfigMaps This way between the related storage document and the profile filename, we have the full path to the object in the database rather than relying on additional prefixes hardcoded into k8s job YAML files. Note that this as a follow-up it'll be necessary to manually move any profiles that had been written into the `<oid>` "directory" in object storage rather than `<oid>/profiles` to the latter. This should only affect profiles created very recently in a 1.8.0-beta release.

tw4l added the bug Something isn't working label Nov 9, 2023

github-project-automation bot added this to Webrecorder Projects Nov 9, 2023

github-project-automation bot moved this to Triage in Webrecorder Projects Nov 9, 2023

tw4l self-assigned this Nov 9, 2023

tw4l moved this from Triage to Dev In Progress in Webrecorder Projects Nov 9, 2023

ikreymer moved this from Dev In Progress to Todo in Webrecorder Projects Nov 9, 2023

Shrinks99 changed the title ~~[Bug]:~~ [Bug]: Browser Profiles result in crawl failures Nov 9, 2023

Shrinks99 changed the title ~~[Bug]: Browser Profiles result in crawl failures~~ [Bug]: Using browser profiles results in crawl failures Nov 9, 2023

tw4l mentioned this issue Nov 9, 2023

Regression fix: add profiles/ prefix to profile filenames #1365

Merged

tw4l moved this from Todo to Dev In Progress in Webrecorder Projects Nov 9, 2023

tw4l moved this from Dev In Progress to PR In Review in Webrecorder Projects Nov 9, 2023

ikreymer closed this as completed in #1365 Nov 10, 2023

github-project-automation bot moved this from PR In Review to Done! in Webrecorder Projects Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Using browser profiles results in crawl failures #1364

[Bug]: Using browser profiles results in crawl failures #1364

tw4l commented Nov 9, 2023 •

edited

Loading

[Bug]: Using browser profiles results in crawl failures #1364

[Bug]: Using browser profiles results in crawl failures #1364

Comments

tw4l commented Nov 9, 2023 • edited Loading

Browsertrix Cloud Version

What did you expect to happen? What happened instead?

Step-by-step reproduction instructions

Additional details

tw4l commented Nov 9, 2023 •

edited

Loading