Skip to content

simplejson incorrectly returns byte strings in many cases #369

Open
@hodgestar

Description

@hodgestar

The JSON specification defines JSON strings as:

A string is a sequence of Unicode code points wrapped with quotation marks (U+0022).

The natural Python analog of these appears to be Python unicode strings. Confusingly, the simplejson library sometimes returns unicode strings and sometimes byte strings, e.g.:

>>> simplejson.loads('"\\u00e6"')
u'\xe6'
>>> simplejson.loads('"ae"')
'ae'
>>> simplejson.loads(u'"ae"')
u'ae'

This makes writing correct code on top of simplejson rather hard.

Riak uses simplejson in two kinds of places -- firstly, for decoding object data in the client. These allow for overriding the default encoders and so are less of an issue. Secondly, in the HTTP transport for decoding JSON responses. These provide no means to control the JSON parser used and return decoded keys and indexes in many places.

For the HTTP transport could we either:

  • An an option for controlling which parser is used by the HTTP client.
  • Use the stdlib json library which returns consistently typed results.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions