Skip to content

Limit the size of cgu names when using the `-Zhuman-readable-cgu-name… #141558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Diggsey
Copy link
Contributor

@Diggsey Diggsey commented May 25, 2025

…s` option

Prior to this change, cgu names could be generated which would result in filenames longer than the limit imposed by the OS.

Fixes #141547

@rustbot
Copy link
Collaborator

rustbot commented May 25, 2025

r? @matthewjasper

rustbot has assigned @matthewjasper.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 25, 2025
@@ -468,6 +469,20 @@ impl<'tcx> CodegenUnit<'tcx> {
hash.as_u128().to_base_fixed_len(CASE_INSENSITIVE)
}

pub fn shorten_name(human_readable_name: &str) -> Cow<'_, str> {
// Set a limit a somewhat below the common platform limits for file names.
const MAX_CGU_NAME_LENGTH: usize = 200;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we can query from the OS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be less than the actual filename limit, since the extension and other bits are added later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even that aside, while on Windows we could use GetVolumeInformationByHandleW to query the maximum component length for local filesystems, the answer for both NTFS and ReFs is always 255 so I don't really see it as worth it. And I think it's very unlikely that it'll change in the near future for API reasons (there are some APIs that have a fixed-sized buffer for component names).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Browsing the limits it looks like 255 is by far the most common amongst filesystems https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cgu name of 200 characters will still cause errors on Windows if the path to the directory it is in is 60 or more characters.

Copy link
Member

@ChrisDenton ChrisDenton May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the below conversation, verbatim paths can be used to workaround that. IIRC llvm handles long paths now and rustc does by virtue of using std APIs.

let mangled_name = Self::mangle_name(human_readable_name);
let truncate_to = human_readable_name
.floor_char_boundary(MAX_CGU_NAME_LENGTH - mangled_name.len() - 1);
format!("{}-{}", &human_readable_name[..truncate_to], mangled_name).into()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear that this can result in CGU collisions, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the mangled name is still included

Copy link
Member

@fmease fmease May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a Windows dev but wouldn't it make sense to "just" use extended-length paths on Windows (whose length limit is 216/2-1 = 32,767) — prefix is \\?\. Docs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended paths seem like the better solution to me as well since that means the logic to name the file will be the same regardless of platform.

Copy link
Contributor Author

@Diggsey Diggsey May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are confusing paths with filenames. The limit here is a filename limit and is a limitation of the file-system, not just the OS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's practical to determine the actual limit, since there are so many factors involved (file-system, OS, etc.) not to mention that when the object files are combined into an archive, the archive file format may have its own limit.

IMO it's much better to impose a simple limit that will work everywhere. Especially given that it only affects users of this niche feature, and only CGUs with abormally long filenames.

let mangled_name = Self::mangle_name(human_readable_name);
let truncate_to = human_readable_name
.floor_char_boundary(MAX_CGU_NAME_LENGTH - mangled_name.len() - 1);
format!("{}-{}", &human_readable_name[..truncate_to], mangled_name).into()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truncating the name without any indication that it is truncated would be confusing IMO. Maybe you can use -Zprint-mono-items instead of -Zhuman-readable-cgu-names for your use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjorn3 Happy to add some indicator that truncation has occurred, but it doesn't make sense to just leave the feature broken... Most filesystems have a 256 character filename limit, so if you happen to get a CGU with a name longer than that it just won't compile with this option enabled.

I think you also need to consider just how long 200 characters is... It's very, very obvious when looking at the filenames that such a CGU has an unnaturally long filename compared to other CGUs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR to add an indicator when the filename was truncated.

…s` option

Prior to this change, cgu names could be generated which would result in
filenames longer than the limit imposed by the OS.
@Diggsey Diggsey force-pushed the db-limit-cgu-name-length branch from 12d4fd7 to fdb660e Compare May 26, 2025 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

-Zhuman-readable-cgu-names generates filenames that exceed the character limit on windows
8 participants