Strings and bytes in Cython

Python2

Strings are bytes.

>>> type("a")
<type 'str'>
>>> type(b'a')
<type 'str'>
>>> type(u'a')
<type 'unicode'>
>>> type("a".encode("UTF-8"))
<type 'str'>
>>> type("a".decode("UTF-8"))
<type 'unicode'>

Python3

Strings are unicode.

>>> type("a")
<class 'str'>
>>> type(b'a')
<class 'bytes'>
>>> type(u'a')
<class 'str'>
>>> type("a".encode("UTF-8"))
<class 'bytes'>
>>> type("a".decode("UTF-8"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'

Receiving char* from C

A function to decode a C character pointer:

cdef unicode tounicode(char* s):
    if s == NULL:
        return None
    else:
        return s.decode("UTF-8", "replace")

In Python2, the c_string is decoded to a type unicode.

>>> c_string.decode("UTF-8")
unicode

In Python3, the c_string is decoded to a type str, which is unicode.

>>> c_string.decode("UTF-8")
str

Passing a string to C function

c_function(item)

Python 2: item should be string (which is bytes in Py2) and needs no conversion, but .encode("UTF-8") will keep it as string/bytes, which can be passed to C

Python 3: item should be bytes and needs to be encoded, .encode("UTF-8") will convert to bytes and then passed to C

Summary

.encode() when passing to C (converts to bytes - py2 string is bytes)
.decode() when receiving from C (converts to unicode - py3 string is unicode)

Intro

Getting Started

Development

Contributing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Strings and bytes in Cython

Python2

Python3

Receiving char* from C

Passing a string to C function

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally