Caption Recognition Model Download Timing Out at 30% (during the Docker image build process -- Github Actions) #900

shahdyousefak · 2024-10-18T02:30:38Z

The caption recognition preprocessor is currently facing issues with downloading the model during the Docker build process. The specific command causing the problem is /preprocessors/caption-recognition/Dockerfile:

RUN wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP/blip_coco_caption_base.pth -P /home/python/.cache/torch/hub/checkpoints

We also attempted to download the model from the Pegasus server., yet both yield to:

#8 12.51 791150K .......... .......... .......... .......... .......... 30% 120M 28s

It consistently gets stuck at 30%, then times out, and causes the caption recognition build to fail.

This issue may be related to rate limiting by the server or limitations imposed by GitHub for free accounts, yet a more thorough investigation may be required.

(Pasting from my conversation with @jeffbl)
so if the problem is the large 2.5GB checkpoint file, I think the issue is one of:

we were rate limited by the server holding the checkpoint. this seemed likely when we were pulling from someone else's server, but you're seeing it from pegasus as well. You can try doing a wget on that file and see if there are any perf issues. Otherwise I think we can rule out the server limiting? --> I was able to successfully execute the wget command, and the model downloaded without any issues, which suggests that it's not a performance issue.
it is a limitation of the build system on github, esp. for a free account. maybe they do rate limiting.

Options I can think of (probably there are more):

remove github build for this, build image locally on unicorn, and push image to github (similar limits?) or someplace else
move the checkpoint download to happen not when building the image, but have that happen each time the container is run. many downsides to this, but would work for our use since it is all on local network. startup time for that container will be bad
We don't use this in production. if an LMM would likely to better at this point anyway, maybe we should just deprecate this.
We turn off github actions for this specific preprocessor, and say in the README it is not automatically building, and open a github item to fix it when we figure out best solution

The text was updated successfully, but these errors were encountered:

shahdyousefak self-assigned this Oct 18, 2024

shahdyousefak added the bug Something isn't working label Oct 18, 2024

shahdyousefak mentioned this issue Oct 18, 2024

Adding docker health checks for 3 preprocessors, along with adding a … #898

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caption Recognition Model Download Timing Out at 30% (during the Docker image build process -- Github Actions) #900

Caption Recognition Model Download Timing Out at 30% (during the Docker image build process -- Github Actions) #900

shahdyousefak commented Oct 18, 2024 •

edited

Loading

Caption Recognition Model Download Timing Out at 30% (during the Docker image build process -- Github Actions) #900

Caption Recognition Model Download Timing Out at 30% (during the Docker image build process -- Github Actions) #900

Comments

shahdyousefak commented Oct 18, 2024 • edited Loading

shahdyousefak commented Oct 18, 2024 •

edited

Loading