Skip to content

Implement Debug for EncodeWide #140153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

thaliaarchi
Copy link
Contributor

@thaliaarchi thaliaarchi commented Apr 22, 2025

Since std::os::windows::ffi::EncodeWide is reexported from std::sys_common::wtf8::EncodeWide, which has #![allow(missing_debug_implementations)] in the parent module, it does not implement Debug.

Implement it like core::str::Chars. Format the WTF-16 code units as char literals and escape them when they're surrogate halves. Although this is not valid Rust syntax, this is fine for Debug.

This becomes insta-stable.

r? libs-api

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 22, 2025
@bjorn3
Copy link
Member

bjorn3 commented Apr 23, 2025

We can't format it like char, because \u escape sequences for surrogate halves are invalid syntax in Rust.

Even if rust doesn't accept \u for surrogate halves, we can still format them as \u. Debug isn't guaranteed to be valid rust source code anyway.

@thaliaarchi
Copy link
Contributor Author

thaliaarchi commented Apr 23, 2025

Here are the options, as I see them. I think formatting them as chars when not surrogates is the most readable for most texts. Rust style seems to lean towards uppercase hex.

  1. EncodeWide([97, 233, 32, 55357, 55357, 56489]): decimal
  2. EncodeWide([0x61, 0xE9, 0x20, 0xD83D, 0xD83D, 0xDCA9]): hex, zero-pad min 2, upper
  3. EncodeWide([0x61, 0xe9, 0x20, 0xd83d, 0xd83d, 0xdca9]): hex, zero-pad min 2, lower
  4. EncodeWide([0x0061, 0x00E9, 0x0020, 0xD83D, 0xD83D, 0xDCA9]): hex, zero-pad 4, upper
  5. EncodeWide([0x0061, 0x00e9, 0x0020, 0xd83d, 0xd83d, 0xdca9]): hex, zero-pad 4, lower
  6. EncodeWide(['a', 'é', ' ', '\u{D83D}', '\u{D83D}', '\u{DCA9}']): pseudo-char, escaped surrogates, upper
  7. EncodeWide(['a', 'é', ' ', '\u{d83d}', '\u{d83d}', '\u{dca9}']): pseudo-char, escaped surrogates, lower

@tgross35 tgross35 added needs-fcp This change is insta-stable, or significant enough to need a team FCP to proceed. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 25, 2025
@thaliaarchi
Copy link
Contributor Author

thaliaarchi commented Jun 26, 2025

I've updated formatting to be 4-wide zero-padded hex (style 4)

For another opinion, @ChrisDenton wrote:

I think EncodeWide, semantically, produces a string like a str but with a different encoding. That said, EncodeWide itself is an iterator more akin to a Bytes iterator of &str (except u16 instead of bytes). But this is all made more complicated by the fact that std doesn't have a WideStr type so people typically collect into a Vec<u16>, which doesn't know it's a wide string. Whereas the iterator does know. So I am sympathetic to the char like representation but for debug I do think we really have to go with a hex repr. And my gut preference is for the 4 zero pad, which is more suggestive of unicode without introducing illegal rust syntax. But I reserve the right to change my mind on that 😆

I think the char-like repr is a non-starter in any case because it looks like rust code but may be illegal (i.e. for surrogates).

(in favor of style 4)

Since `std::os::windows::ffi::EncodeWide` is reexported from
`std::sys_common::wtf8::EncodeWide`, which has
`#![allow(missing_debug_implementations)]` in the parent module, it does
not implement `Debug`.
@thaliaarchi
Copy link
Contributor Author

What's needed next to move this along?

@ChrisDenton ChrisDenton added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Jun 27, 2025
@ChrisDenton
Copy link
Member

I've nominated for libs-api. I don't know if it needs a full libs API discussion but I believe at least someone from the team will need to sign off on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-libs-api-nominated Nominated for discussion during a libs-api team meeting. needs-fcp This change is insta-stable, or significant enough to need a team FCP to proceed. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants