-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Path sis_hash is now cached and used for __lt__, and __eq__. #95
base: master
Are you sure you want to change the base?
Conversation
As a note, in the current form this PR breaks my setups. |
I assume by breaking you setup you mean some job hashes changed? I'm not sure right now how to deal with it. I don't the a way how to integrate We could also keep the current behavior and just add a comment that this is the reason for it. @albertz did this behavior cause any problem for you or did you just stumble upon it and wanted to know why it's implemented like this? |
You change two things here:
I assume the caching logic could break some code. Maybe some code sets the I think that you can also directly fix |
You are right, the cacheing should be it's own PR and would need to be invalidated if Anyway, the sorting might change in some cases if hash_overwrite is considered which breaks backwards compatibility. Leaving it like it is could lead to unexpected behavior, I would expect if to Path return the same sis_id to also return the same hash. I think I make it an option for now, disabled it by default and test it in practices first. |
It's even more complicated. If |
Yes it changes the behavior, but I'm not sure if this really would break anything. Also, it is mostly (only?) really for the case only with |
My suggestion in #94 was actually to keep the behavior of |
The creator is always assumed to be a Job and sis_id of a Job is set and computed internally by Sisyphus at construction time and assumed to be fixed. If you overwrite these internal variables you are running into undefined behavior anyway. So caching would be doable. On the other hand it probably doesn't saves enough time to be worth the risk of breaking something.
I think the ordering stays the same as long as now hash_overwrite is used. Before it compares something like: So the only thing that changes are fixed items added on both sides which shouldn't change the ordering. Well, at least as long as the total string isn't longer than 4096 (at that point the intermediate byte sting will be hash to avoid passing around arbitrary long strings). I think this can be negated since the maximum filename is for most linux filesystems 255 characters. The maximum path length in Linux is 4096, so it could be reached with many subfolders, but I don't think this is a realistic case here. @JackTemaki Does your setup still break with the current version if you set USE_SIS_HASH_FOR_PATH_COMPARISON = True? |
Couldn't you update inputs via |
No, update add's inputs, but the Job id stays the same. These additional inputs depend only on the original Job inputs and parameters, the Job outputs are therefore also only depending on the original given inputs and parameters. |
Does not seem to be the case |
PR to address
__lt__
and__eq__
issue for path mentioned here: #94__hash__
is a bigger problem since it's used by pickle even before all values are set.