PEP 756: Add PyUnicode_EXPORT_ALLOW_COPY flag (#3988)

python · Sep 24, 2024 · f085d19 · f085d19
1 parent 680c8b1
commit f085d19
Showing 1 changed file with 35 additions and 9 deletions.
diff --git a/peps/pep-0756.rst b/peps/pep-0756.rst
@@ -21,9 +21,9 @@ Add functions to the limited C API version 3.14:
   view.
 * ``PyUnicode_Import()``: import a Python str object.
 
-In general, ``PyUnicode_Export()`` has an *O*\ (1) complexity: no memory
-copy is needed. See the :ref:`specification <export-complexity>` for
-cases when a copy is needed.
+By default, ``PyUnicode_Export()`` has an *O*\ (1) complexity: no memory
+is copied. See the :ref:`specification <export-complexity>` for cases
+when a copy is needed.
 
 
 Rationale
@@ -95,6 +95,8 @@ Add the following API to the limited C API version 3.14::
     #define PyUnicode_FORMAT_UTF8  0x08   // char*
     #define PyUnicode_FORMAT_ASCII 0x10   // char* (ASCII string)
 
+    #define PyUnicode_EXPORT_ALLOW_COPY 0x10000
+
 The ``int32_t`` type is used instead of ``int`` to have a well defined
 type size and not depend on the platform or the compiler.
 See `Avoid C-specific Types
@@ -150,18 +152,41 @@ flags.
 
 Note that future versions of Python may introduce additional formats.
 
+By default, no memory is copied and no conversion is done.
+
+If the ``PyUnicode_EXPORT_ALLOW_COPY`` flag is set in
+*requested_formats*, the function can copy memory to provide the
+requested format and convert from a format to another.
+
+The ``PyUnicode_EXPORT_ALLOW_COPY`` flag is needed to export to
+``PyUnicode_FORMAT_UTF8`` a string containing surrogate characters.
+
+Available flags:
+
+===============================  ===========  ===================================
+Flag                             Value        Description
+===============================  ===========  ===================================
+``PyUnicode_EXPORT_ALLOW_COPY``  ``0x10000``  Allow memory copies and conversions
+===============================  ===========  ===================================
+
+
 .. _export-complexity:
 
 Export complexity
 -----------------
 
-In general, an export has a complexity of *O*\ (1): no memory copy is
-needed. There are cases when a copy is needed, *O*\ (*n*) complexity:
+By default, an export has a complexity of *O*\ (1): no memory is copied
+and no conversion is done. There is an exception: if only UTF-8 is
+requested and the UTF-8 cache is not filled, the string is encoded to
+UTF-8 to fill the cache.
+
+If the ``PyUnicode_EXPORT_ALLOW_COPY`` flag is set, there are cases when a
+copy is needed, *O*\ (*n*) complexity:
 
 * If only UCS-2 is requested and the native format is UCS-1.
 * If only UCS-4 is requested and the native format is UCS-1 or UCS-2.
-* If only UTF-8 is requested: the string is encoded to UTF-8 at the
-  first call, and then the encoded UTF-8 string is cached.
+* If only UTF-8 is requested and the string contains surrogate
+  characters.
 
 To get the best performance on CPython and PyPy, it's recommended to
 support these 4 formats::
@@ -236,8 +261,8 @@ The ``PyUnicode_FORMAT_ASCII`` format is mostly useful for
 characters.
 
 
-Surrogate characters and NUL characters
----------------------------------------
+Surrogate characters and embedded NUL characters
+------------------------------------------------
 
 Surrogate characters are allowed: they can be imported and exported. For
 example, the UTF-8 format uses the ``surrogatepass`` error handler.
@@ -347,6 +372,7 @@ to return NULL on embedded null characters
 Rejecting embedded NUL characters require to scan the string which has
 an *O*\ (*n*) complexity.
 
+
 Reject surrogate characters
 ---------------------------