-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Cog v0.8.0 error after upgrading from v0.7.2] Error on Cog build: exec: /sbin/ldconfig.real: not found #1189
Comments
Thanks for writing this up, @Glavin001. I'm sorry you're hitting this issue. I've got a few theories at the moment:
|
Thanks for your prompt reply and ideas!
$ cog build
Building Docker image from environment in cog.yaml as cog-replicate-startup-intervie...
ERROR: permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/_ping": dial unix /var/run/docker.sock: connect: permission denied
ⅹ Failed to build Docker image: exit status 1
|
Also this exact config I think I was on Cog 7.2 before and recently Cog 8 (released 3 days ago)? |
I'm seeing a lot of mentions of WSL (Linux on Windows?) in related issues: microsoft/WSL#4760 Maybe LambdaLabs is using Windows in their stack? Not sure how to verify |
There may be a way to check: microsoft/WSL#4071 (comment) I'll try tonight. |
Doesn't look like Windows WSL? ubuntu@IP:~$ /proc/version
bash: /proc/version: Permission denied
ubuntu@IP:~$ sudo /proc/version
sudo: /proc/version: command not found
ubuntu@IP:~$ uname -a
Linux IP 5.15.0-67-generic #74~20.04.1-Ubuntu SMP Wed Feb 22 14:52:34 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
v0.8.0 is the issue. Workaround: Downgrading to v0.7.2 fixes the issues! 🎉 ✅ $ sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/download/v0.7.2/cog_Linux_x86_64"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 9444k 100 9444k 0 0 10.7M 0 --:--:-- --:--:-- --:--:-- 56.5M
$ sudo chmod +x /usr/local/bin/cog
$ cog --version
cog version 0.7.2 (built 2023-05-23T10:20:56Z) $ sudo cog --version
cog version 0.7.2 (built 2023-05-23T10:20:56Z)
$ sudo cog debug
⚠ Cog doesn't know if CUDA 11.8 is compatible with PyTorch 2.0.0. This might cause CUDA problems.
# syntax = docker/dockerfile:1.2
FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib64:/usr/local/nvidia/bin
RUN --mount=type=cache,target=/var/cache/apt set -eux; \
apt-get update -qq; \
apt-get install -qqy --no-install-recommends curl; \
rm -rf /var/lib/apt/lists/*; \
TINI_VERSION=v0.19.0; \
TINI_ARCH="$(dpkg --print-architecture)"; \
curl -sSL -o /sbin/tini "https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TINI_ARCH}"; \
chmod +x /sbin/tini
ENTRYPOINT ["/sbin/tini", "--"]
ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
make \
build-essential \
libssl-dev \
zlib1g-dev \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
wget \
curl \
llvm \
libncurses5-dev \
libncursesw5-dev \
xz-utils \
tk-dev \
libffi-dev \
liblzma-dev \
git \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
RUN curl -s -S -L https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer | bash && \
git clone https://github.com/momo-lab/pyenv-install-latest.git "$(pyenv root)"/plugins/pyenv-install-latest && \
pyenv install-latest "3.10" && \
pyenv global $(pyenv install-latest --print "3.10") && \
pip install "wheel<1"
COPY .cog/tmp/build4048584965/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl
RUN --mount=type=cache,target=/root/.cache/pip pip install /tmp/cog-0.0.1.dev-py3-none-any.whl
COPY .cog/tmp/build4048584965/requirements.txt /tmp/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt
WORKDIR /src
EXPOSE 5000
CMD ["python", "-m", "cog.server.http"]
COPY . /src |
Here's the --- v0.7.2.txt 2023-07-11 05:31:08
+++ v0.8.0.txt 2023-07-11 05:31:28
@@ -1,18 +1,14 @@
$ sudo cog debug
-⚠ Cog doesn't know if CUDA 11.8 is compatible with PyTorch 2.0.0. This might cause CUDA problems.
-# syntax = docker/dockerfile:1.2
+#syntax=docker/dockerfile:1.4
+FROM curlimages/curl AS downloader
+ARG TINI_VERSION=0.19.0
+WORKDIR /tmp
+RUN curl -fsSL -O "https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini" && chmod +x tini
FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib64:/usr/local/nvidia/bin
-RUN --mount=type=cache,target=/var/cache/apt set -eux; \
-apt-get update -qq; \
-apt-get install -qqy --no-install-recommends curl; \
-rm -rf /var/lib/apt/lists/*; \
-TINI_VERSION=v0.19.0; \
-TINI_ARCH="$(dpkg --print-architecture)"; \
-curl -sSL -o /sbin/tini "https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TINI_ARCH}"; \
-chmod +x /sbin/tini
+COPY --link --from=downloader /tmp/tini /sbin/tini
ENTRYPOINT ["/sbin/tini", "--"]
ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
@@ -40,9 +36,9 @@
pyenv install-latest "3.10" && \
pyenv global $(pyenv install-latest --print "3.10") && \
pip install "wheel<1"
-COPY .cog/tmp/build4048584965/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl
+COPY .cog/tmp/build4127551442/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl
RUN --mount=type=cache,target=/root/.cache/pip pip install /tmp/cog-0.0.1.dev-py3-none-any.whl
-COPY .cog/tmp/build4048584965/requirements.txt /tmp/requirements.txt
+COPY .cog/tmp/build4127551442/requirements.txt /tmp/requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip pip install -r /tmp/requirements.txt
WORKDIR /src
EXPOSE 5000
|
Thanks to @Glavin001 for the quick workaround! here you have the downgrade quick code:
|
I also hit this on lambdalabs machines, cog version 0.8.1. Another workaround is to set |
I think I get a hold of what caused the issue here. To verify it, I have created three simplified dockerfiles. The Original One (v0.7.2) worksFROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib64:/usr/local/nvidia/bin
ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
## Here's the original part that installed tini
RUN --mount=type=cache,target=/var/cache/apt set -eux; \
apt-get update -qq; \
apt-get install -qqy --no-install-recommends curl; \
rm -rf /var/lib/apt/lists/*; \
TINI_VERSION=v0.19.0; \
TINI_ARCH="$(dpkg --print-architecture)"; \
curl -sSL -o /sbin/tini "https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TINI_ARCH}"; \
chmod +x /sbin/tini
ENTRYPOINT ["/sbin/tini", "--"]
RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
make \
build-essential \
libssl-dev \
zlib1g-dev \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
wget \
curl \
llvm \
libncurses5-dev \
libncursesw5-dev \
xz-utils \
tk-dev \
libffi-dev \
liblzma-dev \
git \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
CMD ["python", "-m", "cog.server.http"] Building works fine:
The New One (v0.8.1) failsThis one fails at ## This is the new part that downloads tiny in downloader
FROM curlimages/curl AS downloader
ARG TINI_VERSION=0.19.0
WORKDIR /tmp
RUN curl -fsSL -O "https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini" && chmod +x tini
FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib64:/usr/local/nvidia/bin
## This is the new part that installs tini
COPY --link --from=downloader /tmp/tini /sbin/tini
ENTRYPOINT ["/sbin/tini", "--"]
ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
make \
build-essential \
libssl-dev \
zlib1g-dev \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
wget \
curl \
llvm \
libncurses5-dev \
libncursesw5-dev \
xz-utils \
tk-dev \
libffi-dev \
liblzma-dev \
git \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
CMD ["python", "-m", "cog.server.http"] Removing the tiny part in new one (v0.8.1) worksThe following dockerfile only removes the tiny downloader and tiny copy cmd. Now it builds successfully. FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/nvidia/lib64:/usr/local/nvidia/bin
ENV PATH="/root/.pyenv/shims:/root/.pyenv/bin:$PATH"
RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends \
make \
build-essential \
libssl-dev \
zlib1g-dev \
libbz2-dev \
libreadline-dev \
libsqlite3-dev \
wget \
curl \
llvm \
libncurses5-dev \
libncursesw5-dev \
xz-utils \
tk-dev \
libffi-dev \
liblzma-dev \
git \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
CMD ["python", "-m", "cog.server.http"] ConclusionIt's obvious the change on installing tiny breaks the apt-get system. I am not sure how it breaks internally. But reverting the tiny change might be the correct solution here. cc @mattt |
When you set |
I found the nuance! The new one MISSED the
vs
|
I want to share some good news. After apply the fix #1208, cog build and run work again! ![]() Here's the dockerfile: ![]() |
Hi everyone, apologies - I pushed this change in hopes of making the image smaller and faster to build. It seems like you might have an older version of the cuda base image. The current version of 11.8.0-cudnn8-devel-ubuntu22.04 already have libc-bin installed, and also has /sbin/ldconfig.real. My guess is maybe the |
Here's the output of
|
Can confirm this is still broken on Lambda Labs with new instances. Being that's the recommended cloud workflow, not fun there's not a fix yet! Is there a workaround besides downgrading back to 7.2 COG? |
@djj0s3, could you paste |
cog debug:
docker images --no-trunc|grep cuda yields no output. did you mean something else? |
I'm using cog 0.7.2, and I receive another error. Is there a workaround for this as well?
|
Hi @Glavin001. Thanks for your help and patience as we try to debug this issue. I apologize for the inconvenience this caused. We just released Cog v0.8.2. This release includes #1231, which reverts #1161, which we believe to be the cause of the regression you're seeing. Please give that a try when you have a chance and let us know if you're still having this issue. Thanks! 🙏 |
Impact: I'm unable to build any image using Cog and therefore deploy any models to Replicate.
On both Lambdalabs and TensorDock:
cog.yaml
:I receive the following error logs:
The text was updated successfully, but these errors were encountered: