-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jaeger (and presumably similar) testcontainers test flaky #1871
Comments
Or migrate to quay.io which should not have pull limits. |
Yeah there are a few options for container registry - the problem (maybe not a big one) is we effectively rehost the official image, it'd be nice to avoid that if possible I think. |
I hit this on instrumentation repo locally, it's an image we host on bintray I believe. Maybe a bug in testcontainers? /cc @iNikem
|
Without further digging it seems to me as the problem with container start, not image pull |
Yeah you're right - realized it doesn't actually pull this image usually since it's already pulled (was why my tests were reliably failing before 😅 probably good to set the image pull policy always) |
Noticed that even for the ones I raised timeout from 1 to 2 min, it still fails pretty frequently, both on CI and my macbook. Wonder what's up |
Interesting. these never fail for me locally on my MBP. I wonder what's different. |
And now I've jinxed it and they're failing for me a ton as well. 🤕 |
I've been looking into this and thought that I would add some color for future us. To reproduce, I have been doing:
and if you run that a bunch you might end up seeing an error like the above. It's very inconsistent. To run in a loop until failure:
when there are failures, the container logs look like this:
I don't know what's going on with that, but it almost looks like the client is sending a broken/truncated Anyway, just sharing in case this triggers ideas in others. |
@breedx-splk Thanks a lot for the detailed investigation! I have a hunch the problem is sending an HTTP/1 health check to the gRPC port rather than HTTP port. It's interesting that this sometimes works in a sporadic way, but let me try changing the port and see what happens. |
Lately I see jaeger tests fail with a timeout relatively frequently - I've even noticed this on my machine, not just on CI. Docker Hub has been introducing pull limits from what I understand and perhaps it's affecting us. We probably want to cache
.docker
or whatever to reduce docker pulls because of the rate limiting, though it wouldn't solve the problem for a new contributor. Another option is to rehost on ghcr or bintray, but that seems not great either.The text was updated successfully, but these errors were encountered: