Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container fragility #24

Closed
marcverhagen opened this issue Nov 6, 2023 · 5 comments
Closed

Container fragility #24

marcverhagen opened this issue Nov 6, 2023 · 5 comments

Comments

@marcverhagen
Copy link
Contributor

marcverhagen commented Nov 6, 2023

Because

Not sure how to replicate this, but on one of my machines I could not do a docker-build. It would fail during the pip-install with a "Connection reset by peer" message, typically while installing torch. This went away once I split the requirements file into three files, with the first installing torch, torchvision and torchmetrics. But that is hardly a satisfactory solution.

Maybe related is that on another of my three machine I also get "Connection reset by peer" when downloading "https://download.pytorch.org/models/vgg16-397923af.pth". That download should probably be made part of the image building process.

It is also not totally clear to me what the most efficient build would be. Now we use the clams-python-opencv4-torch2 base image, but with the current requirements all of torch and cuda will be reinstalled because that base is on torch==2.0.1., resulting in an 11.4GB image. I tried to not reinstall torch/cuda, which should be possible by using torchvision==0.15, but that failed with obscure messages.

This leads me to believe that I should use the clams-python-opencv4 base image instead.

Finally, and this should probably also be its own issue in clams-python, the torch images (and probably some others as well) are larger than needed because a pip cache is kept in /root/.cache/pip, which holds 2.6GB of data and the image could therefore be much smaller. Using the following does create a much smaller image.

RUN pip install --no-cache-dir torch==2.1.0
RUN pip install --no-cache-dir torchvision==0.16.0

Done when

No response

Additional context

No response

@marcverhagen
Copy link
Contributor Author

After some experimenting it looks like the following works reasonably well:

FROM ghcr.io/clamsproject/clams-python-opencv4:1.0.9

ARG CLAMS_APP_VERSION
ENV CLAMS_APP_VERSION ${CLAMS_APP_VERSION}

RUN apt-get update && apt-get install -y wget

RUN pip install --no-cache-dir torch==2.1.0
RUN pip install --no-cache-dir torchvision==0.16.0

WORKDIR /app

RUN wget https://download.pytorch.org/models/vgg16-397923af.pth
RUN mkdir /root/.cache/torch /root/.cache/torch/hub /root/.cache/torch/hub/checkpoints
RUN mv vgg16-397923af.pth /root/.cache/torch/hub/checkpoints

COPY . /app

CMD ["python3", "app.py", "--production"]

@keighrim
Copy link
Member

What is the advantage of pip-installing torch/torchvision directly instead of via requirements.txt? I'm working on #30 and it now additionally needs yaml handler as a dependency. I wonder whether we should keep two (identical) files of dependency specs (as dockerfile and req.txt) if there's a clear benefit of doing so.

Also, for backbone models, I think there must be a better way to download a pth file based on the model choice of ours, instead of hard-coding vgg URL (or other better performing models) manually.

@keighrim
Copy link
Member

For model download, we can do something like this;

Given a model config yaml file as

in backbones.py;

...
if __name__ == "__main__":
    import sys
    # pass the model choice via CLI
    model_map[sys.argv[1]]()
    # initiating a `ExtractorModel` instance will also download the pth file and initiate the torchvision model. When this code terminates, downloaded model file should stay in the local cache dir

then in dockerfile

...
RUN python -m modeling.backbones $(grep "model_type" modeling/classifier-config.yaml | cut -d: -f2)

CMD ["python3", "app.py", "--production"]

@marcverhagen
Copy link
Contributor Author

Maybe a little big ugly with the grep inside the container file and because it requires the backbones file to cater just to the container file, but we can play with that a bit. Definitely better than

RUN wget https://download.pytorch.org/models/vgg16-397923af.pth

@keighrim
Copy link
Member

fixed via #48.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants