-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLI: cache outputs & Fuzzy Starts #645
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
I am not going to implement a clear cache command. There's too much low level complexity associated with it. If you do We could take a path to the cache. But is that a path to .cli-cache? The parent folder? The target workflow? Do you want to clear for all workflows or just for one workflow? Basically the path feels so ambiguous and unintuitive, it's not clear to me what it'll do. Ok, so we can confirm the path before we do anything, but it's still an annoying command. Isn't The other option is Maybe we'll come back to it later - right now it feels super high complexity for incredibly low value. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Hi @mtuchi, when you've got a spare half hour can you take another look at this, maybe run a couple of tests? Thanks! |
@josephjclark i have tested the |
The runtime will exit after this step has been executed (even if there are more steps outstanding)
Hi @mtuchi - last one I hope! This PR now supports I plan to merge and release this in the morning. Let me know if you get a chance to test it out |
This comment was marked as resolved.
This comment was marked as resolved.
Always report when you're looking for a cache, and only warn if no input was found
If start is passed but state is not, we always try and load from the cache and warn if we dind't load anything If no start is passed, we never try to load from the cache
let customEnd; | ||
|
||
// Handle start, end and only | ||
if (options.only) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interaction here - not just the working out which step to run, but how it is fed to the runtime AND reported to the user - is not well tested.
It's taken quite a while to work out a good UX.
I don't want to spend much more time on this but maybe I'll see if I can find a good way to test this. I kinda worry that it'll take half a day to get a good suite of tests on it - not worth it right now.
Short Description
This PR:
--cache-steps
is passed--start
if input is not explicitly passed in--end
and--only
as CLI argumentsRelated issue
Fixes #375
Docs
PR open at OpenFn/docs#467
Caching
If the
--cache-steps
flag is passed to the CLI, every step will write its output to a local folder as a JSON file.Caching is useful for debugging, but the real benefit comes from re-running a workflow from a fixed point.
If the
--start
flag is passed, the CLI will automatically load the correct input state from the cache for that step. This is is clearly logged.Note that
--cache-steps
and--start
are mutually exclusive. If you run--start
without--cache-steps
then the cached output will NOT be updated.To work out the right input state, the CLI must find upstream step of the start job. At the moment, this is easy because in a workflow, each step can only have one input. If that rule ever changes, this part of the feature becomes more complex (I've added a comment in workflow validation to this effect).
The cache is written to a folder called
.cli-cache
adjacent to the workflow or job file. A sub folder will be created with the workflow name (which defaults to the file name), and a json file will be created for each step (with the step id).So for
.tmp/workflow.json
you'll get a cache path something like./.cli-cache/workflow/step-1.json
.Caching is off by default.
Setting the
OPENFN_ALWAYS_CACHE_STEPS
env var will default step caching ON. Disable by passing--no-cache-steps
.Fuzzy Starts
The PR also enables "fuzzy" start points to be defined.
This behaviour exists outside of the caching stuff, but the caching stuff benefits from it because it's easier to specify a start node now.
The idea here is that a) you may want to use the step name or id as the start point, depending on what your workflow json looks like; and b) if you're using a project downloaded from Lightning, the ids are a nightmare to type
TODO
cli/src/util/cache.ts
Add an API to clear the cacheI am not going to do this, its actually easier for everyone torm -rf ./cli-cache
--start
to fuzzy match against step names and ids, for good UX--cache
and--start
mutually exclusive (probably)Checklist before requesting a review