-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should not declare VOLUME for /data/db #306
Comments
Storing the already initialized database files in the images layers is not a great idea. Because it would be in a copy-on-write filesystem, the moment that you start a new container and MongoDB changes any of its files, it now uses twice as much space. See also https://docs.docker.com/storage/storagedriver/overlayfs-driver/#modifying-files-or-directories:
And https://docs.docker.com/storage/storagedriver/overlayfs-driver/#performance-best-practices:
Have you thought of using |
Hi @yosifkit, thank you for your detailed explanation! |
I spent quite some time understanding why my bind volume was not being used by mongo docker image, and eventually discovered the hardcoded VOLUME in the dockerfile. I'm fine with rebuilding the image without this declaration (which I did), but I find it weird to force the use of volumes when it should be up to the final user to decide how to store data. |
In my case, I want to construct a seeded database image that will be the basis for runtime containers that use a persistent, named volume for If I fork the official mongo Dockerfile and remove the
The storage space tradeoff is worth it for me, since it saves me so much time when I need to reset to a known state. My seeded DB is a few hundred megabytes or a gigabyte, which I can easily spare on my dev machine or CI instances.
In my case, the DB seed process from a known DB dump takes 8-10 minutes to complete on my workstation. I may need to reset the DB to a known state dozens of times while debugging DB migrations or a new feature. That reset takes seconds if I have a seeded image, but would be untenable if I had to restore the DB every time. I can work around it, but I'd love to see an official |
If you add "--dbpath" or set ".storage.dbPath" in a specified "--config"
file, that value will be respected.
|
I ended up using this, but it means I have to make sure compose files or |
If you're building an image with the data pre-seeded, you can combat that by setting CMD ["--dbpath", "/mongo-data/db"] (and then it'll be the default for users who don't specify a command) |
Yeah, this is the approach aparamon outlined, and it's the strategy I'm currently using. I agree with aparamon that a seeded image is a legitimate use case, and the official image doesn't work well for seeding at build time because it contains As a tooling developer, it would help to have an official mongo image without |
I'm not sure why, but it is a common issue to see for the DB images (equivalent issue for postgres, mysql, redis). These issues remain open for many years, with little valid justification for why EDIT: I understand for these DB images:
The concern for "twice as much space" seems moot as that's already what is happening implicitly? For MongoDB, this is 300MB+ per container instance created. Prepopulated `VOLUME` (not applicable to mongo usage)An implicit anonymous volume copies data from the image to the host per container instance created. This is wasteful and accumulates over runs (if not removing afterwards via The example I link to is fairly simple:
It should be opt-in. A container still persists internal state until it's destroyed/removed. If a user wants to persist the data or have better performance, they should provide a volume explicitly?
So while volumes are a best practice, I disagree about the Other known concerns with
|
For the original use case, an init container would serve this much better. You could have an image with your data copied in and use |
This could work for some users. It seems best used with strong orchestration (e.g. docker compose) and a very small DB seed. Some drawbacks:
Baking DB data into your own mongo image makes it fast and simple to launch new database containers that are in a valid state from first boot, and the official mongo images don't work well as a base for that. And, it's time-consuming and difficult to figure out why your build-time data goes missing when you base your dockerfile on the official mongo images. However, the comment polarathene linked seems to observe that buildkit treats |
Yep! FROM mongo:7.0
RUN touch /data/db/foo $ docker buildx build .
...
#6 writing image sha256:1ca35025fc343c6e199f7abedad161821524f8374d78d474a923aab571d8473f done
#6 DONE 0.0s
$ docker run --rm sha256:1ca35025fc343c6e199f7abedad161821524f8374d78d474a923aab571d8473f ls -l /data/db/
total 0
-rw-r--r-- 1 root root 0 Mar 18 23:11 foo |
Currently, Dockerfile declares VOLUME for /data/db, /data/configdb:
mongo/4.1/Dockerfile
Line 87 in 32e5645
This is sub-optimal, because some workflows in inherited images become excessively complicated.
For example, seeding a database from a dump now requires
instead of just
RUN mongod --fork --logpath /var/log/mongodb.log && \ mongorestore --db mydb /var/mongo-dump/mydb && \ mongod --shutdown
because /data/db doesn't persist between
docker build
anddocker run
invocations.It is proposed to remove VOLUME directive, and leave volumes configuration up to end user.
The text was updated successfully, but these errors were encountered: