-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option max_levels to decode #195
base: master
Are you sure you want to change the base?
Conversation
I'm not immediately in love with this. My first thought is wether https://github.com/talentdeficit/jsx would be a better fit with its SAX style parsing? Do you need to modify the JSON as you route it? I've been reluctant to merge a PR like #139 because it means that Jiffy's output isn't guaranteed to be correct since its not checking the pre-encoded data. |
Thanks for pointing me to jsx, I didn't know about that project. However, if I read it correctly, it would not suit my needs because unfortunately, yes, I need to modify the root node, I'd need to test the performance too, because this PR avoids creating binaries and such that are not going to exist in the end result. I understand your concerns, would you be more keen if I included some way to validate |
Yeah, JSX being an option would have relied on that key being "early" in the JSON doc and not require mutation. My biggest concern around partially encoded JSON is around Jiffy emitting malformed JSON. If we can come up with an implementation that is not overly complex and can be guaranteed to not produce invalid JSON then I'd be on board. Previously when I've noodled over this I was expecting to almost have two parser implementations which I more or less rejected as an option since it doubles the amount of logic that needs testing. However, this PR makes me think there might be a decent approach to handling things such that normal uses of the API would be able to guarantee things enough for me (I'm ignoring cases of users that reverse engineer any internal state to intentionally side step guarantees, or in other words, it doesn't have to be iron clad, just not allow for accidental malformed JSON). I hadn't really thought about attempting a no-op decoding approach like you did for your max levels. I think what leads to an interesting approach along the lines of:
Implementing a validation feature requires that we tweak the decoder structure to run through all checks while not generating Erlang terms. Throwing a boolean on the Decoder struct and then gating all calls would be straightforward I believe. Note that this is different than your current approach since it bypasses some checks. Once we have that, then I'd add a second function that was something like:
This would use the no-op decoding to ensure that the partial binary is in fact valid JSON. The state is required to know that we stop and end at the appropriate places. It'd be something along the lines of When we then go through encoding we'd extend the partial encoding loop that currently only handles bignums to stitch the JSON together which will require us adding/checking comma separators and the like and making assertions on the partial json. I.e., that you can't stick KV pairs inside an array and generate something like Then getting back to your original goal of having a max_levels decoding, we'd just flip on the partial encoding flag dynamically during decoding and return the opaque references that can then be fed back into the encoder without a second pass (which I am assuming covers your ask for disabling re-checking already validated JSON). And then after all of that we write a whole bunch of tests. |
One thing I'd note is that this approach does not allow something funky that affects multiple levels from multiple places. I.e., having two places where partial JSON is inserted to create valid JSON after encoding. I.e., having something like: [
{json, <<"{\"foo\": {\"bar\":">>},
true,
{json, <<"}}">>}
] Or something. But I can't even figure out what that would look like. |
Though, it occurs to me that we could add an encoder option that would encode values and return Partial = jiffy:encode(Term, [partial]),
jiffy:encode({[{<<"foo">>, {[{<<"bar">>, Partial}]}]}). |
I'll work on it and give it a try |
Although I'm not completely sold on resources due to the serialization limitation and the need for a process-independent environment to hold the binary, it works nicely. Can you check the PR and give me your thoughts? |
(sorry for the late reply, it's been a really hectic week) Just to note the changes, in 89ac61b I've updated the validator to check everything. The only thorn left is building the binary for a string if it has_escapes to make use of the constructed binary, I'll see how it ends playing with the offset and the buffer. (Objects are ok by construction) In dc1fd56 the resource is created and returned, currently only for json_values(). I'll work next on having a validate(), (which in my head is only a alias for {max_levels,0}) |
|
Looks like you're going in the right direction. I'll try and find time to give this a thorough review. |
jiffy:json_raw is a reference(), so its opaqueness is stored within it. With this change, it allows to check if a subterm is a json_raw with is_reference/1
40e8204
to
2732f1b
Compare
This PR adds the option
max_levels
tojiffy:decode
.This option allows you to provide a
{max_levels, non_neg_integer()}
as option. Jiffy then decodes un to N levels for the JSON, and the ones deeper than that are returned as{json, binary()}
instead.This PR couples nicely with #139, allowing to
jiffy:encode(jiffy:decode(Binary, [{max_levels, X}]))
Why is it useful?
In my case, I use erlang+jiffy to route streamed jsons to different endpoints based on a routing key that's near the top of the document. Decoding and encoding the whole document every time seems to be wasteful.
A simple test with this shows a great improvement for me, however, I don't know if it'd be worth to include for everyone: