Skip to content

Using the public PyUnicodeWriter C API made the json module slower #133968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vstinner opened this issue May 13, 2025 · 3 comments
Open

Using the public PyUnicodeWriter C API made the json module slower #133968

vstinner opened this issue May 13, 2025 · 3 comments
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@vstinner
Copy link
Member

vstinner commented May 13, 2025

I modified the json module to replace the private _PyUnicodeWriter C API with the public PyUnicodeWriter C API:

Problem: it made the json module slower. Let's investigate what's going on.

Linked PRs

@vstinner vstinner added performance Performance or resource usage stdlib Python modules in the Lib dir labels May 13, 2025
@vstinner
Copy link
Member Author

See also #133186.

vstinner added a commit to vstinner/cpython that referenced this issue May 13, 2025
Don't call PyObject_Str() if the input type is str.
vstinner added a commit that referenced this issue May 13, 2025
Don't call PyObject_Str() if the input type is str.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 13, 2025
…H-133969)

Don't call PyObject_Str() if the input type is str.
(cherry picked from commit fe9f6e8)

Co-authored-by: Victor Stinner <[email protected]>
vstinner added a commit that referenced this issue May 13, 2025
) (#133971)

gh-133968: Add fast path to PyUnicodeWriter_WriteStr() (GH-133969)

Don't call PyObject_Str() if the input type is str.
(cherry picked from commit fe9f6e8)

Co-authored-by: Victor Stinner <[email protected]>
vstinner added a commit to vstinner/cpython that referenced this issue May 13, 2025
Replace most PyUnicodeWriter_WriteUTF8() calls with
PyUnicodeWriter_WriteASCII().
vstinner added a commit to vstinner/cpython that referenced this issue May 13, 2025
Replace most PyUnicodeWriter_WriteUTF8() calls with
PyUnicodeWriter_WriteASCII().
@encukou
Copy link
Member

encukou commented May 15, 2025

Could you try overoptimizing PyUnicodeWriter_WriteUTF8 and benchmarking that, to see how fast the existing function can “theoretically” be?
I'm thinking something like:

  • in the JSON module, force the constants ("null" etc) to be size_t-aligned
  • add private (ABI-only) function _PyUnicodeWriter_WriteUTF8_SmallAligned, which requires size_t-aligned input and size <= sizeof(size_t)
  • make PyUnicodeWriter_WriteUTF8 a macro that calls _PyUnicodeWriter_WriteUTF8_SmallAligned if the need are met, and PyUnicodeWriter_WriteUTF8 otherwise

@picnixz picnixz added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels May 15, 2025
@vstinner
Copy link
Member Author

in the JSON module, force the constants ("null" etc) to be size_t-aligned

I don't know how to guarantee that.

add private (ABI-only) function _PyUnicodeWriter_WriteUTF8_SmallAligned, which requires size_t-aligned input and size <= sizeof(size_t)

Why do you want to limit the size to sizeof(size_t)? Which kind of optimization are you thinking of? find_first_nonascii() already takes care of the different cases with different optimizations.

ascii_decode() is already highly optimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

3 participants