Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

SimpleJSON library has added spaces to API responses #3012

Closed
anoadragon453 opened this issue Mar 18, 2018 · 6 comments
Closed

SimpleJSON library has added spaces to API responses #3012

anoadragon453 opened this issue Mar 18, 2018 · 6 comments
Labels
z-p2 (Deprecated Label)

Comments

@anoadragon453
Copy link
Member

After switching from ujson to simplejson, we've started including spaces in JSON response data. This is both not necessary for bandwidth reasons, and could potentially break compatibility with clients that rely on the old response format.

Reported by @jzk:matrix.org

@ara4n
Copy link
Member

ara4n commented Mar 18, 2018

the current behaviour of Matrix is that the JSON sent over federation does not have to be in canonical form, and is entirely allowed to include whitespace. (And meanwhile the whole CS API is certainly standard RFC7159 JSON and is absolutely allowed to include whitespace; any client which chokes on whitespace is categorically broken.)

If folks implement homeservers which assume that JSON received over federation is canonicalised, then the problem is on their side for now. When/if #2951 or https://github.com/matrix-org/matrix-doc/issues/1013 are merged, then everything will have to do the shuffle to go speak canonical JSON over federation, which will resolve this issue.

However, we have enough challenges right now supporting synapse and keeping matrix.org online and performant without jumping through new hoops to stop jzk's homeserver from crashing; instead for now he should canonicalise received JSON like everyone else does - or alternatively sit talking his own dialect of Matrix until we can sort the issues above.

@jevolk
Copy link

jevolk commented Mar 18, 2018

The number of times you have, without prompt, mentioned JZK and their (yet unreleased) homeserver to couch your reasoning for this ἀπολογία is curious, to say the least.

Here are the facts

  1. Matrix.org had the opportunity to configure simplejson's whitespace in synapse 0.26.1 and failed to do so.

  2. Synapse previously sent minified JSON but now adds arbitrary spaces between all values. Changes to the de facto output of network software, even small changes, especially to the lexical structure: are not to be taken lightly. A lot of software "out there," whether out of necessity, or out of performance demands, or out of incompetence: will make assumptions that may not align with the perfect letter of an RFC or a specification. The goals of network software developers include -- perhaps even stemming from postel's law itself -- to not break things even that aren't following the letter of a specification if one doesn't absolutely have to.

Conclusions on the facts

  1. Given the choice, ceteris paribus, of sending JSON with or without whitespace to best suit the needs of Matrix's application and performance demands: no whitespace is a better choice. Only the error messages have a solid justification for containing spaces for readability. By fixing this issue, huge amounts of waste across the entire system will be diminished.

  2. Given the responsibility matrix.org has as both patrons of the protocol and maintainers of its principal implementation, matrix.org should know that this is the reality of dispersed network software. Therefor, matrix.org should intend to minimize the changes made to their implementation. If matrix.org determines that to fix some security or protocol issue they must change their output to something which, though is de jure allowed by specifications, has never before been seen or tested in the wild: matrix.org must effort to minimize the scope of the change. In this case, involving whitespace was not something matrix.org absolutely had to do.

Involving whitespace in the latest release of Synapse was almost certainly an error. Surely, even an innocent one in the haste of trying to work around an attack on ujson. I see no technical case having been made to justify this on a merit, even something as trivial as:

synapse is a protocol reference homeserver, therefor we want its output to be human readable for developer reference purposes.

No. That case wasn't made. The case I see being made here straddles somewhere between "don't know" and "don't care." I think this is unbecoming of patrons to a burgeoning communication protocol that are striving for wider adoption.

The only thing worse now is that something absolutely trivial in both its technical inception and its technical resolution has turned into a needless debate. Everyone makes mistakes. Just commit the fix and move on.

@neilisfragile neilisfragile added enhancement z-p2 (Deprecated Label) labels Mar 27, 2018
@richvdh richvdh changed the title SimpleJSON library has added spaces to response data SimpleJSON library has added spaces to client API responses Apr 3, 2018
@richvdh richvdh changed the title SimpleJSON library has added spaces to client API responses SimpleJSON library has added spaces to API responses Apr 3, 2018
@richvdh
Copy link
Member

richvdh commented Apr 3, 2018

[The original report wasn't clear on whether the change covered just the CS API, or the Federation API too - I was confused by the reference to 'breaking compatibility with clients' in the description.

It looks like both are affected]

@richvdh
Copy link
Member

richvdh commented Apr 3, 2018

I'm afraid I don't agree that matrix.org has a duty to stick to one exact encoding of JSON. Responses are defined as (regular) JSON by the specification, and homeserver implementations are free to format that JSON however they like. JSON is well-specified, and clients should correctly handle any responses which are are so formatted. I'd almost prefer to change things regularly to shake out bad assumptions.

Canonical JSON (in particular, the requirement that object keys be sorted) is an unnecessary overhead for things which aren't signed, and we aren't going to specify another almost-JSON-but-not-quite. So JSON it stays, and JSON you must accept.

There is an argument that, as an implementation-specific optimisation, synapse could remove whitespace from responses for reasons of conserving bandwidth, however I think the bandwidth involved is trivial, so this is correctly tagged as an 'enhancement'. Still, feel free to send a PR to change it.

@jevolk
Copy link

jevolk commented Apr 4, 2018

According to your benchmark, simplejson:

without sorting:

   dumps (large obj)...
      first run: 0.022985
      20 loops, best of 5: 0.022135 sec per loop
   dumps (small objs)...
      first run: 0.027833
      20 loops, best of 5: 0.027811 sec per loop

with sorting:

   dumps (large obj)...
      first run: 0.022921
      20 loops, best of 5: 0.022529 sec per loop
   dumps (small objs)...
      first run: 0.029035
      20 loops, best of 5: 0.028178 sec per loop

The overhead for sorting your test vector barely registers above the margin of error. Consider that most federation responses only have a handful of keys, like "pdus" and "auth_chain" ... so that's one big single honking comparison right there.

P.S. When you do benchmarks you shouldn't use cpython time.time() (via import timeit) but instead sample the CPU via an rdtsc instruction:

uint64_t tsc;
__asm__ volatile ("rdtsc" : "=A"(tsc));

Just thought you should know.

@richvdh
Copy link
Member

richvdh commented Jun 5, 2019

not a bug.

@richvdh richvdh closed this as completed Jun 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-p2 (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

5 participants