Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to avoid indexing after re-build docker? #110

Open
matthew-z opened this issue Jun 25, 2019 · 7 comments
Open

How to avoid indexing after re-build docker? #110

matthew-z opened this issue Jun 25, 2019 · 7 comments

Comments

@matthew-z
Copy link
Member

matthew-z commented Jun 25, 2019

It seems that jig will perform index and commit it to a new image. If my understanding is correct, after modifying the source code and building a new docker, we also have to re-index to create a new image. I wonder how to avoid it.

I think the most straightforward way is that the index is a directory of the host machine, and it will be mounted into the docker container when we launch it. Thus, even the image is destroyed or outdated, we can still mount the index directory to a new docker container.

@albpurpura
Copy link
Member

Hey Matthew, I proposed this option in the beginning whenever we started the design of the jig. In the end @lintool proposed to save the index in the image within the docker to reduce the loading times. I implemented what you just proposed for the training jig instead, which saves the data to an external file and allows the sharing of the trained models between images.

In any case you could save the index from one image to your host machine, then load the index data again if you wanted to.

@arjenpdevries
Copy link
Member

I was thinking to do it similarly. A good way would be to add one flag to pass a directory to be mounted as a volume for data storage, just like the /input mount - did you do that or just hardcode it @albpurpura?

@matthew-z
Copy link
Member Author

matthew-z commented Jun 25, 2019

I see, we can use the model_folder to mount any data to docker with train hook.
Then, I think it will be great to add a similar arg to other hooks for mounting data from host machine.

@albpurpura
Copy link
Member

@arjenpdevries I did it exactly as you said. The folder to mount is passed as an argument, have a look here https://github.com/osirrc/jig/blob/master/trainer.py

@lintool
Copy link
Member

lintool commented Jun 25, 2019

In the end @lintool proposed to save the index in the image within the docker to reduce the loading times.

Correct. This is a tradeoff between jig complexity (one more thing the jig needs to manage) vs. image efficiency (having to rebuild the index each time). At the start, we opted to simplify the jig since we were just getting started. However, now that things are working, I'm happy to revisit for v2.

@cmacdonald
Copy link
Member

@matthew-z I had some scripts that allowed to update the scripts in an already existing image. See https://github.com/osirrc/terrier-docker/blob/master/dev/bumpContainer.sh

@matthew-z
Copy link
Member Author

@cmacdonald Great! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants