-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: make exla build resilient to stale upgrades #1548
Conversation
Looks good to me! |
exla/README.md
Outdated
|
||
We recommend following the steps below: | ||
|
||
1. If the error appeared after upgrading EXLA, ensure that you have the proper dependency versions given by [XLA](https://github.com/elixir-nx/xla). Afterwards, compile with `mix compile --clean-libexla-cache --force` to clean up all cached files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mix compile
is not going to work when compiled as a dependency. We would need to introduce a separate mix compiler or a separate task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won’t work in the sense that it won’t invoke the code you added in the mix.exs file. mix compile
is only about the local project. Maybe an env var with mix deps.compile exla
will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josevalim it may works as mix deps.clean exla --build --clean-libexla-cache
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err mix deps.compile exla --force --clean-libexla-cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yeah, env var sounds better either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we pass options forward in all of these cases. Although we should probably make mix deps.clean
for dependencies extensible, so we actually invoke clean
in the relevant dep.
exla/mix.exs
Outdated
defp cached_make(_) do | ||
defp cached_make(args) do | ||
{parsed, _args, _invalid} = | ||
OptionParser.parse(args, strict: [clean_libexla_cache: :boolean, force: :boolean]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion to make this a single env var called EXLA_FORCE_REBUILD, as explained above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Env var is probably the simpler alternative :)
exla/mix.exs
Outdated
@@ -150,14 +167,21 @@ defmodule EXLA.MixProject do | |||
cached_so = Path.join([xla_cache_dir(), "exla", cache_key, "libexla.so"]) | |||
cached? = File.exists?(cached_so) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can mostly remove the changes to this function and instead simply write this:
cached? = File.exists?(cached_so) | |
cached? = System.get_env("XLA_FORCE_REBUILD") == nil and File.exists?(cached_so) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be enough to force a rebuild and write the new build to cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josevalim I changed to using env vars, but kept 2 separate ones because there are many instances in which I wanted to leverage the .o
caching while getting rid of the libexla.so file and cache, as well as instances where I wanted to clean-slate everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My problem with the above is that, if something is not working, I need to set two env vars to clear everything. So why not a single env var with different values?
XLA_FORCE_REBUILD=1
- remove allXLA_FORCE_REBUILD=partial
- keep .o
Also, is there any situation where you don't want to cache files to be rewritten? I don't see a reason to skip that as part of these options?
Finally note that we use XLA
env vars for pretty much everything, since we have XLA_CACHE_DIR
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with having special values for the env var and having a single one.
I disagree with using XLA
as the prefix, though, as this doesn't have anything to do with :xla
. XLA_CACHE_DIR
being used in EXLA is just an artifact of "let's use the same subdir for both XLA and EXLA". The origin of the variable is in :xla
.
Also, is there any situation where you don't want to cache files to be rewritten? I don't see a reason to skip that as part of these options?
As for this part, I don't think we are skipping the rewrite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the cached?
check so that there's less confusion about whether rewrites will happen.
cached?
will now be set to false regardless of the file existing if force rebuild is set.
Co-authored-by: José Valim <[email protected]> Co-authored-by: Jonatan Kłosko <[email protected]>
…x/nx into pv-fix/exla-build-for-upgrading
@@ -27,14 +27,13 @@ defmodule EXLA.DeviceMemorySharingTest do | |||
end | |||
|
|||
@tag :cuda_required | |||
test "ipc handles don't crash the runtime when :local mode is selected" do | |||
assert {:error, ~c"Invalid pointer size for selected mode."} == | |||
test "invalid ipc handles don't crash the runtime" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was breaking locally (I guess I hadn't run it on my desktop since the update 😛)
exla/mix.exs
Outdated
false | ||
|
||
"0" -> | ||
false | ||
|
||
"partial" -> | ||
:partial | ||
|
||
"true" -> | ||
true | ||
|
||
"1" -> | ||
true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
false | |
"0" -> | |
false | |
"partial" -> | |
:partial | |
"true" -> | |
true | |
"1" -> | |
true | |
:none | |
"0" -> | |
:none | |
"partial" -> | |
:partial | |
"true" -> | |
:full | |
"1" -> | |
:full |
This should force all the code below to be explicit :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final nitpick and ship it!
Co-authored-by: José Valim <[email protected]> Co-authored-by: Jonatan Kłosko <[email protected]>
fixes #1546