Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add snapshot image after initialization #11

Open
johanneskiesel opened this issue Mar 27, 2019 · 6 comments
Open

Add snapshot image after initialization #11

johanneskiesel opened this issue Mar 27, 2019 · 6 comments

Comments

@johanneskiesel
Copy link

Currently, prepare commits the intermediate image after indexing, but not after initialization:
https://github.com/osirrc2019/jig/blob/4e3765cd59b0869c354b2d7c6f9da826624e470e/run.py#L47

Doing also a commit after initialization can save time, network traffic, and disk space (due to the layered file system, the downloaded files are then only stored once and not for every image).

The tag could be something like "{}-initialized".format(args.tag)

@lintool
Copy link
Member

lintool commented Mar 27, 2019

I'm 👎 on this but open to discussion.

@ryan-clancy
Copy link
Member

If we did this, we would have two images. For example:

  • anserini-test:latest-initialized after init is called
  • anserini-test:latest-indexed after index is called

where first image would be the base image for the second.

I think this would lead into some odd lifecycle management where we'd need to update the base image of the second to be the updated (after re-init) first image, if that's even possible. Another approach may be to start a container using the second image and re-run the init script, but this again can get complicated (init scripts should then be idempotent and need to clean-up existing files before downloading new ones).

I'm 👎 on this too for now as it would add a lot of hidden complexity.

@johanneskiesel
Copy link
Author

Maybe then there is confusion here: Why would you want to re-init an image? I thought init is just about setup? So my confusion is: why would I want to run setup every time I index an collection, when I can just start with a snapshot of after setup was completed?

But in case you would need to re-init an image (I can imagine if you encountered an error or so), why can't you just create both latest-initialized and latest-indexed again? I see you would need an additional "--purge" parameter (or so) for allowing people to forcing an init even if there is already an initialized image.

@lintool
Copy link
Member

lintool commented Mar 29, 2019

I think the tradeoff is more complex lifecycle management... I think we're assuming that init/index will be done once and that's it.

I suppose with all the bells and whistles we can bind each subcommand to a hook and allow committing at each phase in a flexible manner? I'm inclined to punt on this for now though...

@johanneskiesel
Copy link
Author

I see, and I want to say that it is not my intention to press this issue (which might have been lost from the original mail to this issue). I'm well aware that this can be added later on without a problem (it requires no change to the specification), so you can just wait to see whether index is done just once or more often.

@lintool
Copy link
Member

lintool commented Mar 31, 2019

No worries! Thanks for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants