Skip to content

zig build: add env_map entries to hash for Step.Run #20874

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 26, 2025

Conversation

dweiller
Copy link
Contributor

@dweiller dweiller commented Jul 30, 2024

This change fixes false-positive cache hits for run steps that get run with different sets of environment variables due the the environment map being excluded from the cache hash.

I assume the exclusion of the env_map field from the cache hash of a Step.Run was an oversight as it is important for use cases where environment variables are used to control program behaviour (I encountered this issue when using AFLPlusPlus where environment variables are used to enable different instrumentation options). If it was intentional I can open an issue to discuss changing this behaviour.

The implementation in this PR sorts the environment variables by the variable name to ensure that any reordering does not cause a cache miss. I have not implemented special-case logic for windows to normalise environment variable name case before hashing, so you can get cache misses if different codes paths in build.zig add the same environment variable (and value) but use different casing for the variable name.

@dweiller dweiller force-pushed the build-env-var branch 2 times, most recently from bac8640 to 44d1544 Compare August 2, 2024 00:55
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still wrong... please use std.mem.sortUnstable

@dweiller
Copy link
Contributor Author

dweiller commented Aug 2, 2024

still wrong... please use std.mem.sortUnstable

Done.

This change fixes false-positive cache hits for run steps that get run
with different sets of environment variables due the the environment map
being excluded from the cache hash.
Copy link
Member

@alexrp alexrp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me.

@squeek502 any thoughts on the Windows thing?

@squeek502
Copy link
Collaborator

squeek502 commented Mar 26, 2025

For Windows, I believe you'd want to do something like:

Before the kv_pairs.appendAssumeCapacity:

  • Use std.unicode.Wtf8Iterator to iterate over the key's code points (or maybe code point slices?)
  • If the code point is <= std.math.maxInt(u16), then:
    • Call std.os.windows.ntdll.RtlUpcaseUnicodeChar on it
    • Call std.unicode.wtf8Encode with a buf of 3 bytes (RtlUpcaseUnicodeChar returns a u16 which means the code point can always be encoded as 3 WTF-8 bytes)
    • Add the uppercased WTF-8 bytes to the normalized key buffer
  • Else just add the code point's bytes to the normalized key buffer, Windows doesn't have uppercase data for code points > std.math.maxInt(u16)
  • Append the normalized key and the (verbatim) value

(but it seems like a quite rare situation where this normalization would matter if I'm thinking about it right)

@alexrp
Copy link
Member

alexrp commented Mar 26, 2025

Yeah, I don't think this necessarily needs to block the PR considering how much of an edge case it is. IIUC, it seems like the worst case scenario is that two otherwise identical commands using e.g. FOO=bar and foo=bar would be run and cached separately?

I think it'd be fine to merge this as-is and leave the Windows fix as follow-up work, but I'll let @dweiller decide if he feels like doing it in this PR.

@dweiller
Copy link
Contributor Author

IIUC, it seems like the worst case scenario is that two otherwise identical commands using e.g. FOO=bar and foo=bar would be run and cached separately?

Yes, that would be what happens without normalisation.

I think it'd be fine to merge this as-is and leave the Windows fix as follow-up work, but I'll let @dweiller decide if he feels like doing it in this PR.

I'd rather leave normalisation on Windows to follow-up work if possible as I'm not familiar with Windows path name details (and wouldn't be able to test it locally). While I don't think it's ideal to not have normalisation I expect it be be very rare someone would hit it, and it is at least possible to work around in build.zig by making sure they always use the same case (assuming they notice the cache misses and figure out why it's happening). Note that if anyone hits this edge case, their build.zig is also probably broken on other operating systems unless their inconsistent case logic only gets run on Windows.

@alexrp alexrp added this to the 0.14.1 milestone Mar 26, 2025
@alexrp alexrp merged commit b84db31 into ziglang:master Mar 26, 2025
10 checks passed
@alexrp
Copy link
Member

alexrp commented Mar 26, 2025

Follow-up filed at #23366.

@dweiller dweiller deleted the build-env-var branch April 9, 2025 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants