Container fragility #24

marcverhagen · 2023-11-06T15:22:17Z

Because

Not sure how to replicate this, but on one of my machines I could not do a docker-build. It would fail during the pip-install with a "Connection reset by peer" message, typically while installing torch. This went away once I split the requirements file into three files, with the first installing torch, torchvision and torchmetrics. But that is hardly a satisfactory solution.

Maybe related is that on another of my three machine I also get "Connection reset by peer" when downloading "https://download.pytorch.org/models/vgg16-397923af.pth". That download should probably be made part of the image building process.

It is also not totally clear to me what the most efficient build would be. Now we use the clams-python-opencv4-torch2 base image, but with the current requirements all of torch and cuda will be reinstalled because that base is on torch==2.0.1., resulting in an 11.4GB image. I tried to not reinstall torch/cuda, which should be possible by using torchvision==0.15, but that failed with obscure messages.

This leads me to believe that I should use the clams-python-opencv4 base image instead.

Finally, and this should probably also be its own issue in clams-python, the torch images (and probably some others as well) are larger than needed because a pip cache is kept in /root/.cache/pip, which holds 2.6GB of data and the image could therefore be much smaller. Using the following does create a much smaller image.

RUN pip install --no-cache-dir torch==2.1.0
RUN pip install --no-cache-dir torchvision==0.16.0

Done when

No response

Additional context

No response

marcverhagen · 2023-11-06T18:24:58Z

After some experimenting it looks like the following works reasonably well:

FROM ghcr.io/clamsproject/clams-python-opencv4:1.0.9

ARG CLAMS_APP_VERSION
ENV CLAMS_APP_VERSION ${CLAMS_APP_VERSION}

RUN apt-get update && apt-get install -y wget

RUN pip install --no-cache-dir torch==2.1.0
RUN pip install --no-cache-dir torchvision==0.16.0

WORKDIR /app

RUN wget https://download.pytorch.org/models/vgg16-397923af.pth
RUN mkdir /root/.cache/torch /root/.cache/torch/hub /root/.cache/torch/hub/checkpoints
RUN mv vgg16-397923af.pth /root/.cache/torch/hub/checkpoints

COPY . /app

CMD ["python3", "app.py", "--production"]

keighrim · 2023-11-22T07:28:06Z

What is the advantage of pip-installing torch/torchvision directly instead of via requirements.txt? I'm working on #30 and it now additionally needs yaml handler as a dependency. I wonder whether we should keep two (identical) files of dependency specs (as dockerfile and req.txt) if there's a clear benefit of doing so.

Also, for backbone models, I think there must be a better way to download a pth file based on the model choice of ours, instead of hard-coding vgg URL (or other better performing models) manually.

keighrim · 2023-11-22T07:35:27Z

For model download, we can do something like this;

Given a model config yaml file as

app-swt-detection/modeling/models/20231026-164841.config.yml

Line 1 in 9d4f229

model_type: "vgg16"

in backbones.py;

...
if __name__ == "__main__":
    import sys
    # pass the model choice via CLI
    model_map[sys.argv[1]]()
    # initiating a `ExtractorModel` instance will also download the pth file and initiate the torchvision model. When this code terminates, downloaded model file should stay in the local cache dir

then in dockerfile

...
RUN python -m modeling.backbones $(grep "model_type" modeling/classifier-config.yaml | cut -d: -f2)

CMD ["python3", "app.py", "--production"]

marcverhagen · 2023-12-13T22:27:39Z

Maybe a little big ugly with the grep inside the container file and because it requires the backbones file to cater just to the container file, but we can play with that a bit. Definitely better than

RUN wget https://download.pytorch.org/models/vgg16-397923af.pth

keighrim · 2023-12-22T21:43:42Z

fixed via #48.

keighrim mentioned this issue Nov 22, 2023

clear pip cache when building container images clamsproject/clams-python#185

Closed

keighrim mentioned this issue Dec 21, 2023

Releasing 2.0 #45

Merged

keighrim self-assigned this Dec 21, 2023

keighrim mentioned this issue Dec 22, 2023

backbone downloader for container build-time #48

Merged

keighrim closed this as completed Dec 22, 2023

clams-bot unassigned keighrim Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container fragility #24

Container fragility #24

marcverhagen commented Nov 6, 2023 •

edited

Loading

marcverhagen commented Nov 6, 2023

keighrim commented Nov 22, 2023

keighrim commented Nov 22, 2023

marcverhagen commented Dec 13, 2023

keighrim commented Dec 22, 2023

Container fragility #24

Container fragility #24

Comments

marcverhagen commented Nov 6, 2023 • edited Loading

Because

Done when

Additional context

marcverhagen commented Nov 6, 2023

keighrim commented Nov 22, 2023

keighrim commented Nov 22, 2023

marcverhagen commented Dec 13, 2023

keighrim commented Dec 22, 2023

marcverhagen commented Nov 6, 2023 •

edited

Loading