How to avoid indexing after re-build docker? #110

matthew-z · 2019-06-25T07:14:06Z

It seems that jig will perform index and commit it to a new image. If my understanding is correct, after modifying the source code and building a new docker, we also have to re-index to create a new image. I wonder how to avoid it.

I think the most straightforward way is that the index is a directory of the host machine, and it will be mounted into the docker container when we launch it. Thus, even the image is destroyed or outdated, we can still mount the index directory to a new docker container.

albpurpura · 2019-06-25T07:19:02Z

Hey Matthew, I proposed this option in the beginning whenever we started the design of the jig. In the end @lintool proposed to save the index in the image within the docker to reduce the loading times. I implemented what you just proposed for the training jig instead, which saves the data to an external file and allows the sharing of the trained models between images.

In any case you could save the index from one image to your host machine, then load the index data again if you wanted to.

arjenpdevries · 2019-06-25T07:27:51Z

I was thinking to do it similarly. A good way would be to add one flag to pass a directory to be mounted as a volume for data storage, just like the /input mount - did you do that or just hardcode it @albpurpura?

matthew-z · 2019-06-25T07:29:02Z

I see, we can use the model_folder to mount any data to docker with train hook.
Then, I think it will be great to add a similar arg to other hooks for mounting data from host machine.

albpurpura · 2019-06-25T07:29:46Z

@arjenpdevries I did it exactly as you said. The folder to mount is passed as an argument, have a look here https://github.com/osirrc/jig/blob/master/trainer.py

lintool · 2019-06-25T11:21:14Z

In the end @lintool proposed to save the index in the image within the docker to reduce the loading times.

Correct. This is a tradeoff between jig complexity (one more thing the jig needs to manage) vs. image efficiency (having to rebuild the index each time). At the start, we opted to simplify the jig since we were just getting started. However, now that things are working, I'm happy to revisit for v2.

cmacdonald · 2019-07-01T16:28:02Z

@matthew-z I had some scripts that allowed to update the scripts in an already existing image. See https://github.com/osirrc/terrier-docker/blob/master/dev/bumpContainer.sh

matthew-z · 2019-07-01T17:11:23Z

@cmacdonald Great! Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to avoid indexing after re-build docker? #110

How to avoid indexing after re-build docker? #110

matthew-z commented Jun 25, 2019 •

edited

Loading

albpurpura commented Jun 25, 2019

arjenpdevries commented Jun 25, 2019

matthew-z commented Jun 25, 2019 •

edited

Loading

albpurpura commented Jun 25, 2019

lintool commented Jun 25, 2019

cmacdonald commented Jul 1, 2019

matthew-z commented Jul 1, 2019

How to avoid indexing after re-build docker? #110

How to avoid indexing after re-build docker? #110

Comments

matthew-z commented Jun 25, 2019 • edited Loading

albpurpura commented Jun 25, 2019

arjenpdevries commented Jun 25, 2019

matthew-z commented Jun 25, 2019 • edited Loading

albpurpura commented Jun 25, 2019

lintool commented Jun 25, 2019

cmacdonald commented Jul 1, 2019

matthew-z commented Jul 1, 2019

matthew-z commented Jun 25, 2019 •

edited

Loading

matthew-z commented Jun 25, 2019 •

edited

Loading