-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Error creating WACZ #2095
Comments
#1137 might be a solution, if that feature request was implemented. |
After a bit of reverse engineering, I found an undocumented s3 field I think there should be more than 1 attempt to upload the WACZ and if an upload of WACZ ultimately fails, then abort the rest of the crawl since the crawl data is lost. |
Yes, the access_endpoint_url is designed for something like this. It would be odd that the minio instance is not being found, while the crawler is able to run Re: dns issue, I'd be surprised if its anything related to resource exhausition - the upload happens when the browser is already shut down generally. Can the crawler find the DNS when it starts running? You can exec in the crawler and see if it can reach the minio node. Probably what we should do is check that the upload endpoint is available when starting the crawl, and fail immediately it is not - we'll probably add this (in the crawler repo). I believe the crawler pod should be retrying a few times, so it should be retrying automatically - likely the DNS issue is not resolved, so it'll keep failing. |
Browsertrix Version
v1.11.7-7a61568
What did you expect to happen? What happened instead?
I am having some DNS issues, probably from resource exhaustion. (Also filed #2094 to allow cpu_limits on crawler)
When I see this error, the entire crawl is lost and that is frustrating when the crawl has run for 24 hours. I wish that the WACZ upload was attempted multiple times until the upload eventually completes or some threshold is met.
Reproduction instructions
Not sure. I'm using kind
0.24.0
. The cluster conflg is standard, just opens the nodeport.I'm using an external minio s3 instance. The minio s3 instance has to be behind HTTPS for replays to work, so I cannot provide the IP address.
Screenshots / Video
No response
Environment
No response
Additional details
I've tried every workaround that I could imagine.
The text was updated successfully, but these errors were encountered: