[help] How to handle mixed local and cloud storage #1381
Unanswered
tarensanders
asked this question in
Help
Replies: 1 comment 1 reply
-
What about instead of syncing _targets/meta/meta with version control, use |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Help
Description
I've been using cloud storage with great success, but ran into an issue when using a mix of cloud storage and local storage when trying to move between machines.
I'm using mixed storage methods to try to reduce the cost of cloud storage. I have a few targets that take a very long time to generate which get sent to the cloud, and then some much faster targets (some of which generate large objects) which make sense to keep locally rather than have to pay for them to be checked on each pipeline run. I'm using GCS, so as a sketch the pipeline looks something like:
I then keep
_targets/meta/meta
under version control. This works great, but the issue is when I then go to setup another computer. I needed to use a HPC to rerun the slow targets so I cloned the repo and tried to rerun the pipeline, but I ran into an error that targets was trying to access a file that doesn't exist. In this example, targets would be looking for_targets/objects/fast_target
, but because/objects
doesn't exist this fails. In the end I just ended up copying in the local objects to resolve the issue, but that seems hacky.In retrospect I probably should have used
tar_invalidate
on the local targets. But is there any other way to deal with this?Beta Was this translation helpful? Give feedback.
All reactions