Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 756: Hypothetical CPython Unicode changes #3987

Closed
wants to merge 1 commit into from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Sep 23, 2024


📚 Documentation preview 📚: https://pep-previews--3987.org.readthedocs.build/

@vstinner
Copy link
Member Author

@serhiy-storchaka @zooba: What do you think? Is it worth it to discuss these two "hypothetical Unicode changes", UTF-8 and UTF-16?

@vstinner
Copy link
Member Author

@serhiy-storchaka @terryjreedy: Do you recall when/where using UTF-16 was discussed?

@zooba
Copy link
Member

zooba commented Sep 23, 2024

We ought to start by explaining that we assume that our internal representation will change in the future, and we are deliberately not constraining any changes with this API. Then we could take those hypothetical changes and use them as examples of "if we changed X, then this API would still work if it's used like ...". That shows that the API design is robust (if people use it right), and so is suitable for the limited API. We're just trying to provide an optimisation here, since there are already stable APIs that will behave properly all the time.

The important thing is for us to not get derailed by discussing hypothetical futures. People love to bikeshed about stuff like that, and it's only a distraction here. We need to keep the focus on "this API design is flexible, and won't have to change even if we change its results".

@terryjreedy
Copy link
Member

For 2.x, UTF-16 was more or less what was used on Windows (or UCS-2 with surrogates? I forget such details) and UFT-32 ~= UCS-4 elsewhere. I am not aware of any proposal to use XXX-16 everywhere. I would just say that that new constants will be added if needed.

@zooba
Copy link
Member

zooba commented Sep 24, 2024

Last I heard (a while back) Jython implicitly uses UTF-16 as its internal representation, because the Java string type does, and IronPython running on .NET Framework probably does too (I believe .NET Core moved to UTF-8 internally though, and that's the one that matters these days).

So I doubt CPython would ever switch to it, but I wouldn't rule out alternative implementations wanting to use it. And a limited API candidate should take into account alternative implementations.

@vstinner vstinner marked this pull request as draft September 24, 2024 21:25
@vstinner
Copy link
Member Author

I don't think that this change is still needed with #3999.

@vstinner vstinner closed this Sep 26, 2024
@vstinner vstinner deleted the pep756_hypothetical branch September 26, 2024 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants