Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for consistent dependency package behavior across platforms #18728

Closed
jacwil opened this issue Jan 29, 2024 · 4 comments
Closed

Proposal for consistent dependency package behavior across platforms #18728

jacwil opened this issue Jan 29, 2024 · 4 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. zig build system std.Build, the build runner, `zig build` subcommand, package management

Comments

@jacwil
Copy link

jacwil commented Jan 29, 2024

Problem

Issues #17652 and #18089 where package contents causes failures or inconsistent hashes.

Summarizing the above linked issues: symlinks cause AccessDenied failures on Windows and the package hash on Linux may differ from MacOS and Windows because the latter are case-insensitive with their filenames.

Proposal

This proposal is to provide build.zig.zon functionality developers can use to resolve simple issues encountered when building on certain OSes. Current options include 1) forking the upstream package to resolve the issues or implicitly 2) for a package to limit its OS support.

It does not address the need for a package linter. Reporting and/or enforcing developers to avoid known issues (i.e. linting) is out of scope of this proposal.

New fields being proposed in build.zig.zon dependencies:

  • renames as described in package hash differs between operating systems due to file system case sensitivity #18089 (comment) by marler8997.
  • exclude_paths is a list of paths like paths, but to exclude for the purpose of resolving conflicts.
  • copy_symlink_target_max_depth defaults to 0 to imply "fail on OSes that do not support symlinks instead of copying" (behavior today) and a positive integer for how deep to follow symlinks. Any symlinks still remaining after the max depth has been reached would be implicitly excluded. This operation would be applied to all OSes when a positive integer is provided to ensure package hashes match.

Case for renames field: Despite a brief review of conflicts in tar files showing the contents are duplicate, it is very possible to have different contents and thus result in information loss when using just an include/exclude strategy. Support for renaming avoids the data loss.

Case for exclude_paths field: Primarily to remove symlinks, but could be used to remove duplicate files that exist because upstream wrote the file to the package again but with different casing. Also could be used (and abused?) by developers as an optimization to remove unnecessary files. This is seen from the perspective that not all dependencies have package manifests. And those that do have package manifests might include symlinks and naming conflicts.

Case for copy_symlink_target_max_depth field: Can be used to resolve simple symlinks issues for OSes like Windows that cannot be resolved with exclude_paths.

Misc Thoughts

Why exclude_paths instead of reusing the existing paths field? paths is purely an inclusion list. When the developer only wants to exclude a file or two that's a symlink or causing a naming conflict, they have to specify all sibling files and directories. This could be dozens or hundreds of files they must specify.

The gitignore pattern format was considered but might be over-engineering the issue exclude_paths is looking to solve. Packaging issues are expected to be rare. Ideally there's a small number of files to remove if any. Aside the complexity add the gitignore pattern format would introduce, any upstream package with many many issues may not benefit from the feature if the paths could not be globbed together.

@slonik-az
Copy link

slonik-az commented Jan 29, 2024

I do not like renaming. It treats the symptom not the "disease". Renaming means that one has to chase @import cImport and such and update filenames there.

To avoid data loss on case-insensitive systems like MacOS a conforming package should have no file or directory name collisions when compared case-insensitive. Package hashing tooling should refuse to generate hash if collisions are found. Otherwise hash is generated using canonical low-case converted filenames instead of real ones. The order of files/directories in hashing operation is lexicographical by low-case names. This guarantees that the package hash is reproducible on all file systems whether case-sensitive or not. Authors of packages with files/directories that differ only by case of letters have to manually rename the offenders which is a one-time hassle. And IMHO it also makes package structure more readable. A situation that two files override each other on MacOS is avoided.

@jacwil
Copy link
Author

jacwil commented Jan 30, 2024

I have only a few cherry-picked usages to refer to. Mostly the upstream package being consumed is not maintained by the developer and there is little or no incentive for upstream to correct any issues encountered when a developer attempts to import the tarball directly from upstream.

That said, I might be completely off about how Zig's build system consumes packages. Maybe the intention is never to consume from upstream and the developer should always repackage upstream into something with a zig build manifest. In that case I 100% agree with you, all packages should conform to avoid issues with naming conflicts and symlinks and inconsistent hashing.

Concerning the situation where two files conflict with each other on MacOS and Windows, I'm fully expecting Zig to error in the near future. I am currently looking into making the change andrewrk described in #18089 (comment) while this proposal cooks:

The first step is to implement errors in the tar unpacking code when a tar file cannot be extracted correctly due to the file system being incompatible with the tar file.
...
Just like with symlinks, this will cause URLs to work on some operating systems but fail to be fetched on others.

@jacwil
Copy link
Author

jacwil commented Jan 30, 2024

Something I'm trying very hard not to do when I approach this problem is enforce the developer (or upstream) to do anything. I'm considering enforcement scope creep for my purpose: consistent behavior across platforms when referencing a package. That package might be another Zig project, a tarball, or a git repository.

Rant warning...

I'm keeping away from enforcement for a second reason though. I do not want to be the one that destroys what makes certain OSes great for the purpose of conformity. My opinion and bias:

  1. I do not know of any good reason to keep case sensitivity in filenames. I do not know of any no issue in enforcing all paths of build.zig.zon's paths field to be treated as case-insensitive. Prime target of a linter. If someone is treating filenames as a key in a hashtable then I would argue they could have done better by using a database or custom file format.
  2. I do think it would be a tragedy to block the use of symlinks. The benefits of deduplicating output can save a lot of disk space and I/O copying around bits. Allowing a Zig developer to use symlinks in their output when the target supports would be beneficial, Windows be damned.

Snapping to the least common denominator is a way to solve problems. Not usually the best way if perf is a top goal. I'm proposing to implement the least common denominator in zig --fetch by treating symlinks as a copy directive. I don't like it. I think it falls short of Zig's goals. I think its a code smell caused by how Zig is validating the package referenced hasn't changed- a leak of how Zig implements caching of packages. But I don't know of a better solution and requesting a per-OS hash in build.zig.zon seems like a terrible idea.

But I also don't want to push my immature and limited view of the problem onto the Zig project. I do not know enough and very likely made some critical mistakes in how I'm seeing things. If I see areas I can contribute that push Zig in the direction it wants to go and also aligns with my perspective of what I want Zig to become, I think that's a win-win. But enforcement and reporting, specifically enforcement and reporting on symlinks, does not align with my current perspective.

Linting and enforcing on naming conflicts is something that I can get behind and support. And maybe it deserves to be an item in this proposal. But its my proposal and I'm going to pull the defer card for being out of scope. I think its better saved for when its easier for a developer to repackage an offending upstream source with an assumption that upstream cannot (or refuses) to make the change. I don't like to be held hostage by upstream anymore than Zig likes to force itself on others.

@andrewrk andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. zig build system std.Build, the build runner, `zig build` subcommand, package management labels Jan 31, 2024
@andrewrk
Copy link
Member

I've already outlined how this is to be solved:

  • If the package cannot be extracted correctly on the host OS, then an error occurs
  • The paths affects what constitutes a package being extracted correctly.
  • No renames. No other kind of configuration.

@andrewrk andrewrk closed this as not planned Won't fix, can't repro, duplicate, stale Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. zig build system std.Build, the build runner, `zig build` subcommand, package management
Projects
None yet
Development

No branches or pull requests

3 participants