Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5.4 DigitalOcean Build error out of memory #71

Open
sjud opened this issue Mar 14, 2021 · 24 comments
Open

5.4 DigitalOcean Build error out of memory #71

sjud opened this issue Mar 14, 2021 · 24 comments

Comments

@sjud
Copy link

sjud commented Mar 14, 2021

Hello,
I am on section 5.4 and I'm trying to get the app running on DigitalOcean. I am having trouble getting it to run after compiling, it compiles on DigitalOcean and then stalls for a while before issuing this error.

Build Error: Out of Memory

Your build job failed because it was out of memory.
Error code: BuildJobOutOfMemory

I went up to 4GB of RAM to see if that would change the result and it had no effect. Most searches suggest increasing memory but I imagine that other people have been able to run the app with less. Here's the last of the log:

zero2prod | 18:47:21 INFO[1231] Changed working directory to /app
zero2prod | 18:47:21 INFO[1231] Creating directory /app
zero2prod | 18:47:21 INFO[1231] Taking snapshot of files...
zero2prod | 18:47:21 INFO[1231] COPY . .
zero2prod | 18:47:21 INFO[1231] Taking snapshot of files...
zero2prod | 18:47:21 INFO[1231] COPY --from=cacher /app/target target
zero2prod | 18:47:47 INFO[1257] Taking snapshot of files...
zero2prod | 18:48:33 INFO[1303] COPY --from=cacher /usr/local/cargo /usr/local/cargo
zero2prod | 18:49:22 INFO[1352] Taking snapshot of files...

The web service doesn't expose a console, at least not during this stage, so I'm not sure how to debug the problem further, any advice would be appreciated.
Thank you. :)

@sjud
Copy link
Author

sjud commented Mar 15, 2021

So, I noticed that it kept getting stuck in the cargo chef stages and when I deleted those it built. So right now my Dockerfile is
`FROM rust:1.50 AS builder
WORKDIR app
COPY . .
ENV SQLX_OFFLINE true
RUN cargo build --release --bin zero2prod

FROM debian:buster-slim AS runtime
WORKDIR app
RUN apt-get update -y
&& apt-get install -y --no-install-recommends openssl
&& apt-get autoremove -y
&& apt-get clean -y
&& rm -rf /var/lib/apt/lists/*
COPY --from=builder /app/target/release/zero2prod zero2prod
COPY configuration configuration
ENV APP_ENVIRONMENT production
ENTRYPOINT ["./zero2prod"]`

And it also built in the same amount of time ~20 minutes that it took me to get the error I was having. I'm not sure exactly what cargo chef was doing so I can't comment further but will leave this open in case its of interest.

@LukeMathWalker
Copy link
Owner

My wild guess is that one of the COPY directives is causing RAM usage to go above the (very constrained) capacity of Digital Ocean's bulider.
This is so annoying, sorry you had to troubleshoot it!

@Giesch
Copy link

Giesch commented Apr 27, 2021

I had successfully deployed zero2prod on DO using cargo-chef on an earlier chapter, but then ran into this issue when I came back to it and caught up. Because I'm not the only one who ran into it, and sjud is on the earlier chapter, I suspect something changed on DO's side. There are a number of issues on the image builder DO uses like this one that imply a version bump made it have trouble with multistage builds with a lot of files (like cargo's target or npm's node_modules).

@eeff
Copy link

eeff commented May 30, 2021

I want to write down my experience about deploying to digital ocean app platform following #4-deploy-to-digitalocean-apps-platform hoping this will save somebody a whole day.

About leveraging docker's caching capabitlity

The post talks about optimizing build image size and the dockerfile have the following structure

FROM lukemathwalker/cargo-chef as planner
...
FROM lukemathwalker/cargo-chef as cacher
...
FROM rust:1.50 AS builder
...
FROM debian:buster-slim AS runtime

The lukemathwalker/cargo-chef image is based on the rust image. Specifying rust:1.50 as the base in the builder stage does not ensure the builder stage will leverage the cache from the cacher stage, because the version of the rust image that lukemathwalker/cargo-chef based on may not be the same as rust:1.50. And it turns out to recompile the dependencies in the builder stage in my local machine. What's worst is that cargo generates artifacts for the whole project again, making the builder image size explode to about 7 GiB in my machine !

As a solution, I explicitly specify the rust image version and install cargo-chef:

############### Planner stage ###############
FROM rust:1.49 AS planner

WORKDIR /app

RUN cargo install cargo-chef

# Copy all files from our working environment
COPY . .

# Compute a lock-like file for our project
RUN cargo chef prepare --recipe-path recipe.json


############### Cacher stage ###############
FROM rust:1.49 AS cacher

WORKDIR /app

RUN cargo install cargo-chef

COPY --from=planner /app/recipe.json recipe.json

# Build our project dependencies, not our application
RUN cargo chef cook --release --recipe-path recipe.json

############### Builder stage ###############

# We use the latest Rust stable release as base image
FROM rust:1.49 AS builder

WORKDIR /app

# Copy over the cached dependencies
COPY --from=cacher /app/target target
COPY --from=cacher $CARGO_HOME $CARGO_HOME
...

this does solve the problem and reduces the image build time.

About deploying to digital ocean

With the modified dockerfile, I head off to deploy to digital ocean and it failed:

Build Error: Out of Memory

Your build job failed because it was out of memory.
Error code: BuildJobOutOfMemory

This error message is not very helpful and misleading.
The support team told me that the resources for builds is 8gb of combined RAM and disk space.
It is more about the disk space than the RAM in this case.
Looking at digital ocean's deployment log, I found the lines:

2021-05-30T04:53:10.806979004Z �[36mINFO�[0m[2338] RUN cargo build --release --bin zero2prod
2021-05-30T04:53:10.807015294Z �[36mINFO�[0m[2338] Taking snapshot of full filesystem...
2021-05-30T04:53:59.618085679Z �[36mINFO�[0m[2387] cmd: /bin/sh
2021-05-30T04:53:59.618123452Z �[36mINFO�[0m[2387] args: [-c cargo build --release --bin zero2prod]
2021-05-30T04:53:59.618303719Z �[36mINFO�[0m[2387] Running: [/bin/sh -c cargo build --release --bin zero2prod]
2021-05-30T04:54:03.447134582Z Compiling libc v0.2.94
2021-05-30T04:54:03.452872384Z Compiling tokio v1.6.0
2021-05-30T04:54:03.510041285Z Compiling num-traits v0.2.14
... # a lot more lines

It's recompiling the dependencies again !!! But why? Further the team told me that they are using kaniko to build from the dockerfile instead of the usual docker daemon. Anyway, it turns out to not respect the cache.

The final rescue I pick up is to use the container registry:

# spec.yaml
name: zero2prod
region: sgp
services:
  - name: zero2prod
    image:
      registry_type: DOCR
      repository: zero2prod
...

If you could tolerate the painful build time in digital ocean, another solution is to use the simple dockerfile avoiding the cache.

Updates

Replace cargo-chef with cargo-build-deps (worked solution)

After a litte googling, I found cargo-build-deps which utilizes cargo build -p and does not need a recipe.json file.
Without needing to generate a bookkeeping file will help make the docker build process simpler, and possibly help out kaniko.
To give it a try, I update the Dockerfile, and bingo, it works!.

Please note that cargo-build-deps enfores cargo update before building the dependencies which I think is not the desired behavior, so I just make my own clone of it. See issue

@gihrig
Copy link

gihrig commented May 30, 2021

@eeff Thanks for the detailed write-up!

I ran into the same out of memory error and rather than engage in the extensive troubleshooting journey you documented, I gave up on DO and built my own Docker host on a VPS server.

That is a significant project and lacks some of DO's features but offers a lot more power for the money, if we're talking about a full-time production app.

As Luke put it:

"deployments are (still) a messy business."

@frjonsen
Copy link

frjonsen commented Jun 19, 2021

@eeff Thank you very much. This also helped me figure out why my builds were so much slower than I'd expect: because it wasn't using the cache, and instead recompiles everything in the builder stage

Unfortunately while this did improve things slightly, it did not resolve the error. I will attempt to use cargo-build-deps and see if that helps.

EDIT: I went with the easiest solution I could think of, linking my github account to a docker hub account, building the image there, and then using the image option in the spec.yaml for DigialOcean instead of pulling from github. It does involve an extra component, in having to go via docker hub, but the end result seems to be the same.

@sr-fuentes
Copy link

I am running into a similar issue in deploying to DO using the latest Dockerfile from the 20210712 version of the book. The building job fails with these logs:

[2021-07-16 20:40:53] INFO[0140] WORKDIR /app [2021-07-16 20:40:53] INFO[0140] cmd: workdir [2021-07-16 20:40:53] INFO[0140] Changed working directory to /app [2021-07-16 20:40:53] INFO[0140] Creating directory /app [2021-07-16 20:40:53] INFO[0140] Taking snapshot of files... [2021-07-16 20:40:53] INFO[0140] COPY --from=cacher /app/target target [2021-07-16 20:40:53] error building image: error building stage: failed to execute command: resolving src: failed to get fileinfo for /kaniko/1/app/target: lstat /kaniko/1/app/target: no such file or directory [2021-07-16 20:40:53] [2021-07-16 20:40:53] command exited with code 1 [2021-07-16 20:40:56] ! Build failed (exit code 1)

@gyzerok
Copy link

gyzerok commented Jul 22, 2021

Just wanted to mention that I've got the very same problem. I solved it by removing cargo-chef for now. However would be nice to use layer caching, otherwise build times are crazy :)

@aboseley
Copy link

aboseley commented Aug 7, 2021

I removed cargo-chef to make it work also

FROM rust:1.54.0 AS builder
WORKDIR /app
COPY . .
COPY configuration configuration
ENV SQLX_OFFLINE true
# Build our application, leveraging the cac
RUN cargo install --path .

FROM debian:buster-slim
RUN apt-get update -y && \
    apt-get install -y openssl \
    && rm -rf /var/lib/apt/lists/*
COPY --from=builder /usr/local/cargo/bin/zero2prod /usr/local/bin/
COPY configuration configuration
ENV APP_ENVIRONMENT production
ENTRYPOINT ["/usr/local/bin/zero2prod"]

It took 40 minutes to build and deploy on Digital Ocean

@LukeMathWalker
Copy link
Owner

Can you try with this edited Dockerfile, that still includes cargo-chef but avoids copying over the cached dependencies?

FROM lukemathwalker/cargo-chef:latest-rust-1.53.0 as chef
WORKDIR /app

FROM chef as planner
COPY . .
RUN cargo chef prepare  --recipe-path recipe.json

FROM chef as builder
COPY --from=planner /app/recipe.json recipe.json
RUN cargo chef cook --release --recipe-path recipe.json
COPY . .
ENV SQLX_OFFLINE true
RUN cargo build --release --bin zero2prod

FROM debian:buster-slim AS runtime
WORKDIR /app
RUN apt-get update -y \
    && apt-get install -y --no-install-recommends openssl \
    # Clean up
    && apt-get autoremove -y \
    && apt-get clean -y \
    && rm -rf /var/lib/apt/lists/*
COPY --from=builder /app/target/release/zero2prod zero2prod
COPY configuration configuration
ENV APP_ENVIRONMENT production
ENTRYPOINT ["./zero2prod"]

@LukeMathWalker
Copy link
Owner

I managed to try it out this morning and I can confirm it no longer fails due to an out-of-memory error.
The revised Dockerfile will be included in the next release.

@boyswan
Copy link

boyswan commented Dec 17, 2021

I'm still having the same issues as described above. Is this expected to be fixed in the latest version of cargo chef?

@LukeMathWalker
Copy link
Owner

This is not a cargo-chef issue unfortunately - it's a fundamental limitation of the build machines on DO combined with high resource usage by the Rust compiler.
The real solution is going to be ditching DO for Docker builds I am afraid.

@norman784
Copy link

In my case where it fails with OOM was when taking a snapshot after cargo build --release, after testing different solutions what worked for me was to build, copy the binary and then clean the target dir, all in the same command.

RUN cargo build --release --bin zero2prod && \
          cp /app/target/release/zero2prod zero2prod && \
          cargo clean

So in the last step I copy the file from /app/zero2prod instead of /app/target/release/zero2prod, one downside of this is that I ended up removing the chef step, so my build times are not so good, but acceptable for the moment.

@chamons
Copy link

chamons commented Mar 14, 2022

@LukeMathWalker - I'd like suggest this issue be reopened. I just ran into this today while finishing up chapter 10, so it still seems to be an issue.

I tried a number of solutions, including bumping the size of my machine and RUN cargo build --release --bin zero2prod && cp ./target/release/zero2prod ./zero2prod && cargo clean in my docker file, but to no avail so far.

Edit: I've tried bumping the production server up multiple tiers to no avail.

@LukeMathWalker
Copy link
Owner

Unfortunately the size of the production server has no influence on the size of the build server 😞

@chamons
Copy link

chamons commented Mar 14, 2022

I'm looking into using the docker registry along with github's CI to resolve this (build images on GH not DI). If I get something working, I'll post details here.

@chamons
Copy link

chamons commented Mar 15, 2022

Here is the workflow that works for me:

https://gist.github.com/chamons/654f005caf2318db7a0f818a3c33fe2d

You'll obviously need to replace caffeinated-gorilla registry name the app name zero-2-prod, and the tag to fit your configuration.

You have to:

  • Setup a container registry
  • Change your Digital Ocean yaml configuration to point to an image (as show in the diff)
  • Add a workflow in github to build a docker image and push it.

I do not have it setup to push every build, as github has usage limits I'm afraid of hitting, but that is possible.

The biggest thing missing is docker image caching. I know there is a cache action, and that should drastically reduce docker build time in theory. I hope to mess with it tonight, but I wanted to share what I found.

This setup is significantly worse that the app builder, and I reached to support to let them know, but it at least works.

@JonShort
Copy link

JonShort commented Mar 29, 2022

Just going through chapter 5 now and hit this - is there any fix our side or are we just waiting for DO to do something?

...in the meantime I've opened a DO support ticket (why not since it's a paid service)


Edit - expand for the response from DO

In App Platform, the build memory is shared between files and the processes running to build the application. Builds are limited to 8GiB of total memory. As of now, we cannot increase the memory allocated during the build phase. As with any file system, there is some per-file overhead so sites with lots of small files may count higher. The processes plus the per-file overhead is likely what’s leading to this OOM. We don’t have any immediate solutions on our end for this build error.

Increasing the tier unfortunately wouldn’t be helpful in the build phase. However, there is a workaround that you can give a try. You can consider building via Dockerfile outside of App Platform and leverage DOCR support (or Docker Hub) to deploy the image in the App Platform. You can also achieve the same using GitHub Actions.

TL:DR nothing they can do, recommend building the container elsewhere

@LukeMathWalker
Copy link
Owner

We are waiting for DO to do something.
You can work around the problem by building the Docker image via GitHub actions and telling DO to use it, as @chamons described.

@JonShort
Copy link

Update - so I switched my .dockerignore to use an allowlist pattern just to ensure we're not copying over any unnecessary build context from wherever DO runs the docker build and the build completed fine (+1 additional follow-up build)

See this commit

Probably complete coincidence but thought I'd post here in case it helps anyone else

@bsl
Copy link

bsl commented Apr 3, 2022

I could hardly know less about this, but when I got the OOM, I hit Retry in the Activity tab and it succeeded. Maybe something is being drawn from cache on the second attempt?

@jgirardet
Copy link

Update - so I switched my .dockerignore to use an allowlist pattern just to ensure we're not copying over any unnecessary build context from wherever DO runs the docker build and the build completed fine (+1 additional follow-up build)

See this commit

Probably complete coincidence but thought I'd post here in case it helps anyone else

same fix as you, did the trick here

@Ifletcher668
Copy link

I ran into this issue recently as well. To be honest, I feel like I tried every approach here, and none of them worked, and then I tried each of them again, and was finally able to push to DigitalOcean and have it succeed with out the OOM error.

Figured I would leave this here in case anyone else was having trouble and this miraculously worked for them, too.

Dockerfile
FROM lukemathwalker/cargo-chef:latest-rust-1.59.0 as chef
WORKDIR /app
RUN apt update && apt install lld clang -y

FROM chef as planner
COPY . .
# Compute a lock-like file for our project
RUN cargo chef prepare  --recipe-path recipe.json

FROM chef as builder
COPY --from=planner /app/recipe.json recipe.json
# Build our project dependencies, not our application!
RUN cargo chef cook --release --recipe-path recipe.json
COPY . .
ENV SQLX_OFFLINE true
# Build our project
RUN cargo build --release --bin zero2prod

FROM debian:bullseye-slim AS runtime
WORKDIR /app
RUN apt-get update -y \
   && apt-get install -y --no-install-recommends openssl ca-certificates \
   # Clean up
   && apt-get autoremove -y \
   && apt-get clean -y \
   && rm -rf /var/lib/apt/lists/*
COPY --from=builder /app/target/release/zero2prod zero2prod
COPY config config
ENV APP_ENVIRONMENT production
ENTRYPOINT ["./zero2prod"]
Cargo.toml
 [package]
name = "zero2prod"
version = "0.1.0"
edition = "2021"

[lib]
path = "src/lib.rs"

[[bin]]
path = "src/main.rs"
name = "zero2prod"

[dependencies]
actix-web = "4.0.1"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
serde = { version = "1", features = ["derive"]}
serde-aux = "3"
config = "0.11"
uuid = { version = "0.8.1", features = ["v4"] }
chrono = "0.4.15"
tracing = { version = "0.1", features = ["log"] }
tracing-log = "0.1"
tracing-subscriber = { version = "0.3", features = ["registry", "env-filter"] }
tracing-bunyan-formatter = "0.3"
secrecy = { version = "0.8", features = ["serde"] }
tracing-actix-web = "0.5"
# tracing-error <- look into this

[dependencies.sqlx]
version = "0.5.7"
default-features = false
features = [
"runtime-actix-rustls",
"macros",
"postgres",
"uuid",
"chrono",
"migrate",
"offline"
]

[dev-dependencies]
reqwest = "0.11"
once_cell = "1"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests