gh-133968: Add PyUnicodeWriter_WriteASCII() function #133973

vstinner · 2025-05-13T14:15:13Z

Replace most PyUnicodeWriter_WriteUTF8() calls with PyUnicodeWriter_WriteASCII().

Issue: Using the public PyUnicodeWriter C API made the json module slower #133968

📚 Documentation preview 📚: https://cpython-previews--133973.org.readthedocs.build/

Replace most PyUnicodeWriter_WriteUTF8() calls with PyUnicodeWriter_WriteASCII().

vstinner · 2025-05-13T14:33:52Z

Benchmark	ref	change
encode 100 booleans	7.15 us	6.54 us: 1.09x faster
encode 100 integers	11.6 us	11.7 us: 1.01x slower
encode 100 "ascii" strings	13.4 us	13.2 us: 1.02x faster
encode escaped string len=128	1.11 us	1.10 us: 1.01x faster
encode 1000 booleans	39.3 us	32.9 us: 1.19x faster
encode Unicode string len=1000	4.93 us	4.94 us: 1.00x slower
encode 10000 booleans	343 us	286 us: 1.20x faster
encode ascii string len=10000	28.5 us	28.8 us: 1.01x slower
encode escaped string len=9984	38.7 us	38.9 us: 1.00x slower
encode Unicode string len=10000	42.6 us	42.4 us: 1.00x faster
Geometric mean	(ref)	1.02x faster

Benchmark hidden because not significant (11): encode 100 floats, encode ascii string len=100, encode Unicode string len=100, encode 1000 integers, encode 1000 floats, encode 1000 "ascii" strings, encode ascii string len=1000, encode escaped string len=896, encode 10000 integers, encode 10000 floats, encode 10000 "ascii" strings

Up to 1.20x faster to encode booleans is interesting knowing that these strings are very short: "true" (4 characters) and "false" (5 characters).

vstinner · 2025-05-13T14:35:02Z

The PyUnicodeWriter_WriteASCII() function is faster than PyUnicodeWriter_WriteUTF8(), but has an undefined behavior if the input string contains non-ASCII characters.

@serhiy-storchaka: What do you think of this function?

vstinner · 2025-05-13T14:35:13Z

cc @ZeroIntensity

ZeroIntensity

Some nits

Doc/c-api/unicode.rst

ZeroIntensity · 2025-05-13T15:17:12Z

Doc/c-api/unicode.rst

@@ -1806,9 +1806,24 @@ object.

   See also :c:func:`PyUnicodeWriter_DecodeUTF8Stateful`.

+.. c:function:: int PyUnicodeWriter_WriteASCII(PyUnicodeWriter *writer, const char *str, Py_ssize_t size)


Does the caller need to hold a thread state? That's not immediately clear to me with a string writing API.

ZeroIntensity · 2025-05-13T15:22:21Z

Objects/unicodeobject.c

+                           const char *str,
+                           Py_ssize_t size)
+{
+    _PyUnicodeWriter *priv_writer = (_PyUnicodeWriter*)writer;


It's nice to have sanity-check assertions like this:

Suggested change

_PyUnicodeWriter *priv_writer = (_PyUnicodeWriter*)writer;

assert(writer != NULL);

_PyUnicodeWriter *priv_writer = (_PyUnicodeWriter*)writer;

If the caller needs a thread state, it would be good to call _Py_AssertHoldsTstate as well.

serhiy-storchaka · 2025-05-13T15:42:03Z

Well, we had _PyUnicodeWriter_WriteASCIIString for reasons.

But unicode_decode_utf8_writer is already optimized for ASCII. Can it be optimized even more? In theory, it can be made almost as fast as _PyUnicodeWriter_WriteASCIIString.

We can add private _PyUnicodeWriter_WriteASCII for now, to avoid regression in JSON encode, and then try to squeeze nanoseconds from PyUnicodeWriter_WriteUTF8. If we fail, we can add public PyUnicodeWriter_WriteASCII.

Co-authored-by: Peter Bierma <[email protected]>

vstinner · 2025-05-13T16:12:12Z

But unicode_decode_utf8_writer is already optimized for ASCII. Can it be optimized even more?

I don't think that it can become as fast or faster than a function which takes ASCII string as argument. If we know that the input string is ASCII, there is no need to scan the string for non-ASCII characters, and we can take the fast path.

You're right that the UTF-8 decoder is already highly optimized.

vstinner · 2025-05-14T13:23:08Z

In short:

PyUnicodeWriter_WriteUTF8() calls ascii_decode() which is an efficient ASCII decoder.
PyUnicodeWriter_WriteASCII() calls memcpy().

It's hard to beat memcpy() performance!

serhiy-storchaka · 2025-05-14T15:07:42Z

Yes, although it was close, at least for moderately large strings. Could it be optimized even more? I don't know.

But decision about PyUnicodeWriter_WriteASCII should be made by the C API Workgroup. I'm not sure of my opinion yet. This API is unsafe.

vstinner · 2025-05-14T19:45:06Z

I created capi-workgroup/decisions#65 issue.

vstinner · 2025-05-14T19:57:26Z

Benchmark:

write_utf8 size=10: Mean +- std dev: 153 ns +- 1 ns
write_utf8 size=100: Mean +- std dev: 174 ns +- 1 ns
write_utf8 size=1,000: Mean +- std dev: 279 ns +- 0 ns
write_utf8 size=10,000: Mean +- std dev: 1.36 us +- 0.00 us

write_ascii size=10: Mean +- std dev: 141 ns +- 0 ns
write_ascii size=100: Mean +- std dev: 149 ns +- 0 ns
write_ascii size=1,000: Mean +- std dev: 176 ns +- 3 ns
write_ascii size=10,000: Mean +- std dev: 690 ns +- 8 ns

On long strings (10,000 bytes), PyUnicodeWriter_WriteASCII() is up to 2x faster (1.36 us => 690 ns) than PyUnicodeWriter_WriteUTF8().

from _testcapi import PyUnicodeWriter
import pyperf

range_100 = range(100)

def bench_write_utf8(text, size):
    writer = PyUnicodeWriter(0)
    for _ in range_100:
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)

def bench_write_ascii(text, size):
    writer = PyUnicodeWriter(0)
    for _ in range_100:
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)

runner = pyperf.Runner()
sizes = (10, 100, 1_000, 10_000)

for size in sizes:
    text = b'x' * size
    runner.bench_func(f'write_utf8 size={size:,}', bench_write_utf8, text, size,
                      inner_loops=1_000)

for size in sizes:
    text = b'x' * size
    runner.bench_func(f'write_ascii size={size:,}', bench_write_ascii, text, size,
                      inner_loops=1_000)

encukou · 2025-05-15T09:04:41Z

Do we know where the bottleneck is for long strings?
Would it make sense have a version of find_first_nonascii that checks and copies in the same loop?

pythongh-133968: Add PyUnicodeWriter_WriteASCII() function

2aa2e87

Replace most PyUnicodeWriter_WriteUTF8() calls with PyUnicodeWriter_WriteASCII().

vstinner requested review from isidentical, JelleZijlstra, Eclips4, gpshead, picnixz, markshannon and 1st1 as code owners May 13, 2025 14:15

bedevere-app bot mentioned this pull request May 13, 2025

Using the public PyUnicodeWriter C API made the json module slower #133968

Open

bedevere-app bot added the awaiting core review label May 13, 2025

vstinner mentioned this pull request May 13, 2025

gh-133968: Use private unicode writer for json #133832

Open

ZeroIntensity reviewed May 13, 2025

View reviewed changes

Update Doc/c-api/unicode.rst

fc08c32

Co-authored-by: Peter Bierma <[email protected]>

vstinner mentioned this pull request May 14, 2025

Add PyUnicodeWriter_WriteASCII() function capi-workgroup/decisions#65

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-133968: Add PyUnicodeWriter_WriteASCII() function #133973

gh-133968: Add PyUnicodeWriter_WriteASCII() function #133973

vstinner commented May 13, 2025 •

edited by github-actions bot

Loading

vstinner commented May 13, 2025

vstinner commented May 13, 2025

vstinner commented May 13, 2025

ZeroIntensity left a comment

ZeroIntensity May 13, 2025

ZeroIntensity May 13, 2025

serhiy-storchaka commented May 13, 2025

vstinner commented May 13, 2025

vstinner commented May 14, 2025

serhiy-storchaka commented May 14, 2025

vstinner commented May 14, 2025

vstinner commented May 14, 2025

encukou commented May 15, 2025

		@@ -1806,9 +1806,24 @@ object.

		See also :c:func:`PyUnicodeWriter_DecodeUTF8Stateful`.

		.. c:function:: int PyUnicodeWriter_WriteASCII(PyUnicodeWriter writer, const char str, Py_ssize_t size)

	_PyUnicodeWriter priv_writer = (_PyUnicodeWriter)writer;
	assert(writer != NULL);
	_PyUnicodeWriter priv_writer = (_PyUnicodeWriter)writer;

gh-133968: Add PyUnicodeWriter_WriteASCII() function #133973

Are you sure you want to change the base?

gh-133968: Add PyUnicodeWriter_WriteASCII() function #133973

Conversation

vstinner commented May 13, 2025 • edited by github-actions bot Loading

vstinner commented May 13, 2025

vstinner commented May 13, 2025

vstinner commented May 13, 2025

ZeroIntensity left a comment

Choose a reason for hiding this comment

ZeroIntensity May 13, 2025

Choose a reason for hiding this comment

ZeroIntensity May 13, 2025

Choose a reason for hiding this comment

serhiy-storchaka commented May 13, 2025

vstinner commented May 13, 2025

vstinner commented May 14, 2025

serhiy-storchaka commented May 14, 2025

vstinner commented May 14, 2025

vstinner commented May 14, 2025

encukou commented May 15, 2025

vstinner commented May 13, 2025 •

edited by github-actions bot

Loading