Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Using browser profiles results in crawl failures #1364

Closed
tw4l opened this issue Nov 9, 2023 · 0 comments · Fixed by #1365
Closed

[Bug]: Using browser profiles results in crawl failures #1364

tw4l opened this issue Nov 9, 2023 · 0 comments · Fixed by #1365
Assignees
Labels
bug Something isn't working

Comments

@tw4l
Copy link
Member

tw4l commented Nov 9, 2023

Browsertrix Cloud Version

v1.8.0-beta.2-3aebf2e

What did you expect to happen? What happened instead?

I expect crawls with a custom browser profile to run successfully. Instead, the crawl fails with a cryptic S3 error in the crawler logs because the crawler is looking for the profile at <s3 endpoint>/<oid>/profile/<filename> but the file is actually at <s3 endpoint>/<oid>/<filename>.

Step-by-step reproduction instructions

  1. Create a new browser profile
  2. Attempt to run a crawl with that browser profile
  3. Watch the crawl fail

Additional details

It seems the profile/ path prefix for profiles got dropped at some point in the storage refactoring work (PR #1296). To fix, we'll add the prefix back and move any files that need to be moved in the S3 buckets.

@tw4l tw4l added the bug Something isn't working label Nov 9, 2023
@tw4l tw4l self-assigned this Nov 9, 2023
@tw4l tw4l moved this from Triage to Dev In Progress in Webrecorder Projects Nov 9, 2023
@ikreymer ikreymer moved this from Dev In Progress to Todo in Webrecorder Projects Nov 9, 2023
@Shrinks99 Shrinks99 changed the title [Bug]: [Bug]: Browser Profiles result in crawl failures Nov 9, 2023
@Shrinks99 Shrinks99 changed the title [Bug]: Browser Profiles result in crawl failures [Bug]: Using browser profiles results in crawl failures Nov 9, 2023
@tw4l tw4l moved this from Todo to Dev In Progress in Webrecorder Projects Nov 9, 2023
@tw4l tw4l moved this from Dev In Progress to PR In Review in Webrecorder Projects Nov 9, 2023
ikreymer pushed a commit that referenced this issue Nov 10, 2023
Fixes #1364 

Regression fix for issue introduced in storage refactoring (see issue
for more details).

Changes:
1. Add `profiles/` prefix to profile filename passed in to crawler for
profile creation and written into db
2. Remove hardcoded `profiles/` prefix from crawler YAML
3. Add migration to add `profiles/` prefix to profile filenames that
don't already have it, including updating PROFILE_FILENAME in ConfigMaps

This way between the related storage document and the profile filename,
we have the full path to the object in the database rather than relying
on additional prefixes hardcoded into k8s job YAML files.

Note that this as a follow-up it'll be necessary to manually move any
profiles that had been written into the `<oid>` "directory" in object
storage rather than `<oid>/profiles` to the latter. This should only
affect profiles created very recently in a 1.8.0-beta release.
@github-project-automation github-project-automation bot moved this from PR In Review to Done! in Webrecorder Projects Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done!
Development

Successfully merging a pull request may close this issue.

1 participant