Skip to content

gh-133968: Add PyUnicodeWriter_WriteASCII() function #133973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented May 13, 2025

Replace most PyUnicodeWriter_WriteUTF8() calls with PyUnicodeWriter_WriteASCII().


📚 Documentation preview 📚: https://cpython-previews--133973.org.readthedocs.build/

Replace most PyUnicodeWriter_WriteUTF8() calls with
PyUnicodeWriter_WriteASCII().
@vstinner
Copy link
Member Author

JSON benchmark: #133832 (comment)

Benchmark ref change
encode 100 booleans 7.15 us 6.54 us: 1.09x faster
encode 100 integers 11.6 us 11.7 us: 1.01x slower
encode 100 "ascii" strings 13.4 us 13.2 us: 1.02x faster
encode escaped string len=128 1.11 us 1.10 us: 1.01x faster
encode 1000 booleans 39.3 us 32.9 us: 1.19x faster
encode Unicode string len=1000 4.93 us 4.94 us: 1.00x slower
encode 10000 booleans 343 us 286 us: 1.20x faster
encode ascii string len=10000 28.5 us 28.8 us: 1.01x slower
encode escaped string len=9984 38.7 us 38.9 us: 1.00x slower
encode Unicode string len=10000 42.6 us 42.4 us: 1.00x faster
Geometric mean (ref) 1.02x faster

Benchmark hidden because not significant (11): encode 100 floats, encode ascii string len=100, encode Unicode string len=100, encode 1000 integers, encode 1000 floats, encode 1000 "ascii" strings, encode ascii string len=1000, encode escaped string len=896, encode 10000 integers, encode 10000 floats, encode 10000 "ascii" strings

Up to 1.20x faster to encode booleans is interesting knowing that these strings are very short: "true" (4 characters) and "false" (5 characters).

@vstinner
Copy link
Member Author

The PyUnicodeWriter_WriteASCII() function is faster than PyUnicodeWriter_WriteUTF8(), but has an undefined behavior if the input string contains non-ASCII characters.

@serhiy-storchaka: What do you think of this function?

@vstinner
Copy link
Member Author

cc @ZeroIntensity

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits

@@ -1806,9 +1806,24 @@ object.
See also :c:func:`PyUnicodeWriter_DecodeUTF8Stateful`.
.. c:function:: int PyUnicodeWriter_WriteASCII(PyUnicodeWriter *writer, const char *str, Py_ssize_t size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the caller need to hold a thread state? That's not immediately clear to me with a string writing API.

const char *str,
Py_ssize_t size)
{
_PyUnicodeWriter *priv_writer = (_PyUnicodeWriter*)writer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to have sanity-check assertions like this:

Suggested change
_PyUnicodeWriter *priv_writer = (_PyUnicodeWriter*)writer;
assert(writer != NULL);
_PyUnicodeWriter *priv_writer = (_PyUnicodeWriter*)writer;

If the caller needs a thread state, it would be good to call _Py_AssertHoldsTstate as well.

@serhiy-storchaka
Copy link
Member

Well, we had _PyUnicodeWriter_WriteASCIIString for reasons.

But unicode_decode_utf8_writer is already optimized for ASCII. Can it be optimized even more? In theory, it can be made almost as fast as _PyUnicodeWriter_WriteASCIIString.

We can add private _PyUnicodeWriter_WriteASCII for now, to avoid regression in JSON encode, and then try to squeeze nanoseconds from PyUnicodeWriter_WriteUTF8. If we fail, we can add public PyUnicodeWriter_WriteASCII.

Co-authored-by: Peter Bierma <[email protected]>
@vstinner
Copy link
Member Author

But unicode_decode_utf8_writer is already optimized for ASCII. Can it be optimized even more?

I don't think that it can become as fast or faster than a function which takes ASCII string as argument. If we know that the input string is ASCII, there is no need to scan the string for non-ASCII characters, and we can take the fast path.

You're right that the UTF-8 decoder is already highly optimized.

@vstinner
Copy link
Member Author

In short:

  • PyUnicodeWriter_WriteUTF8() calls ascii_decode() which is an efficient ASCII decoder.
  • PyUnicodeWriter_WriteASCII() calls memcpy().

It's hard to beat memcpy() performance!

@serhiy-storchaka
Copy link
Member

Yes, although it was close, at least for moderately large strings. Could it be optimized even more? I don't know.

But decision about PyUnicodeWriter_WriteASCII should be made by the C API Workgroup. I'm not sure of my opinion yet. This API is unsafe.

@vstinner
Copy link
Member Author

I created capi-workgroup/decisions#65 issue.

@vstinner
Copy link
Member Author

Benchmark:

write_utf8 size=10: Mean +- std dev: 153 ns +- 1 ns
write_utf8 size=100: Mean +- std dev: 174 ns +- 1 ns
write_utf8 size=1,000: Mean +- std dev: 279 ns +- 0 ns
write_utf8 size=10,000: Mean +- std dev: 1.36 us +- 0.00 us

write_ascii size=10: Mean +- std dev: 141 ns +- 0 ns
write_ascii size=100: Mean +- std dev: 149 ns +- 0 ns
write_ascii size=1,000: Mean +- std dev: 176 ns +- 3 ns
write_ascii size=10,000: Mean +- std dev: 690 ns +- 8 ns

On long strings (10,000 bytes), PyUnicodeWriter_WriteASCII() is up to 2x faster (1.36 us => 690 ns) than PyUnicodeWriter_WriteUTF8().

from _testcapi import PyUnicodeWriter
import pyperf

range_100 = range(100)

def bench_write_utf8(text, size):
    writer = PyUnicodeWriter(0)
    for _ in range_100:
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)

def bench_write_ascii(text, size):
    writer = PyUnicodeWriter(0)
    for _ in range_100:
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)

runner = pyperf.Runner()
sizes = (10, 100, 1_000, 10_000)

for size in sizes:
    text = b'x' * size
    runner.bench_func(f'write_utf8 size={size:,}', bench_write_utf8, text, size,
                      inner_loops=1_000)

for size in sizes:
    text = b'x' * size
    runner.bench_func(f'write_ascii size={size:,}', bench_write_ascii, text, size,
                      inner_loops=1_000)

@encukou
Copy link
Member

encukou commented May 15, 2025

Do we know where the bottleneck is for long strings?
Would it make sense have a version of find_first_nonascii that checks and copies in the same loop?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants