Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix more Unicode bugs #24182

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Nov 4, 2024

  • Use Latin-1 in many native file write rules for consistency with the internal encoding.
  • Use Latin-1 for the resolved repository file and the JSON profile.
  • Fix unused_input_list handling of non-ASCII characters in file names.
  • Flip the legacy_utf8 parameter of repository_ctx.file to False and make it a no-op. With the previous default, any non-ASCII characters would be written out as double encoded UTF-8, which is not a useful choice.
  • Change repository_ctx.template to operate on raw bytes for consistency with repository_ctx.read and to fix substitution with non-ASCII keys/values.
  • Move some usages of UTF_8 closer to their usage site to clarify why they are correct.
  • Fixes parsing of dependency files with Unicode character contents (/showIncludes and .d files)

@fmeum fmeum force-pushed the 23859-unicode-starlark branch 5 times, most recently from d539093 to b378d1f Compare November 5, 2024 14:29
@fmeum fmeum requested a review from tjgq November 5, 2024 15:55
@fmeum fmeum marked this pull request as ready for review November 5, 2024 15:55
@fmeum fmeum requested review from aranguyen and removed request for a team November 5, 2024 15:55
@github-actions github-actions bot added team-Performance Issues for Performance teams team-Configurability platforms, toolchains, cquery, select(), config transitions team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. team-Rules-Java Issues for Java rules team-Rules-CPP Issues for C++ rules team-Local-Exec Issues and PRs for the Execution (Local) team team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels Nov 5, 2024
@fmeum
Copy link
Collaborator Author

fmeum commented Nov 5, 2024

@tjgq This is stacked on #24010, but the last commit is new and ready for review.

@tjgq
Copy link
Contributor

tjgq commented Nov 7, 2024

@fmeum Mind rebasing before I review?

@fmeum
Copy link
Collaborator Author

fmeum commented Nov 11, 2024

@tjgq Thanks for the review, it helped me find and fix more issues.

I noticed that there are a bunch of FileSystemUtils methods that read and write files in different encodings. I could clean that up in a follow-up PR or as part of this PR, as you prefer.

@fmeum fmeum requested a review from tjgq November 11, 2024 13:43
@tjgq
Copy link
Contributor

tjgq commented Nov 11, 2024

@tjgq Thanks for the review, it helped me find and fix more issues.

I noticed that there are a bunch of FileSystemUtils methods that read and write files in different encodings. I could clean that up in a follow-up PR or as part of this PR, as you prefer.

I'd prefer a separate PR, thanks!

@tjgq
Copy link
Contributor

tjgq commented Nov 11, 2024

@fmeum it looks like there's an issue with .netrc parsing? (see failed tests)

@fmeum
Copy link
Collaborator Author

fmeum commented Nov 11, 2024

@fmeum it looks like there's an issue with .netrc parsing? (see failed tests)

Should be fixed now

@fmeum fmeum requested a review from tjgq November 11, 2024 14:10
@fmeum fmeum force-pushed the 23859-unicode-starlark branch 3 times, most recently from 67fbb4e to 60bacc5 Compare November 11, 2024 16:40
@fmeum
Copy link
Collaborator Author

fmeum commented Nov 11, 2024

@meteorcloudy Compiling a cc_library that depends on a header file with non-ASCII characters in its name results in this kind of error:

ERROR: C:/b/absmfft5/execroot/_main/_tmp/925c5432c7427d85ec55d631e57c838b/workspace/pkg/BUILD:5:10: Compiling pkg/bin.cc failed: undeclared inclusion(s) in rule '//pkg:bin':
this rule is missing dependency declarations for the following files included by 'pkg/bin.cc':
  'pkg/„”�Ž™šá??.h'

The raw bytes of the file name are 112, 107, 103, 47, -124, -108, -127, -114, -103, -102, -31, 63, 63, 46, 104, which indicates that the output of MSVC is in the OEM codepage 437. This is hinted at here. Do you know of a way to force cl.exe to output Unicode when it's not running in a console (or, alternatively, to have it think it's running in a console)? I guess we could wrap cl.exe in a .bat script that runs chcp 65001 prior to invoking the compiler, but that seems pretty intrusive.

@fmeum
Copy link
Collaborator Author

fmeum commented Nov 12, 2024

@fmeum fmeum force-pushed the 23859-unicode-starlark branch 2 times, most recently from 38b4907 to c2226bd Compare November 12, 2024 11:28
@fmeum
Copy link
Collaborator Author

fmeum commented Nov 12, 2024

I have given up on getting cl.exe to work. Users could presumably still do this themselves by switching their system code page to UTF-8. CI should be green now that I am skipping the test on Windows.

@fmeum
Copy link
Collaborator Author

fmeum commented Nov 14, 2024

@tjgq Rebased onto master.

@tjgq tjgq added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Nov 15, 2024
@fmeum
Copy link
Collaborator Author

fmeum commented Nov 15, 2024

@bazel-io fork 8.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally team-Configurability platforms, toolchains, cquery, select(), config transitions team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. team-Local-Exec Issues and PRs for the Execution (Local) team team-Performance Issues for Performance teams team-Remote-Exec Issues and PRs for the Execution (Remote) team team-Rules-CPP Issues for C++ rules team-Rules-Java Issues for Java rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants