Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nix parse: parse a nix expr or nix file to aterm or json syntax tree (AST) #5512
nix parse: parse a nix expr or nix file to aterm or json syntax tree (AST) #5512
Changes from 24 commits
7211edd
f57c978
7329c19
137df01
a9a0e19
36581e6
658d93b
d1a5880
abfe441
cc92fab
7a02477
76ab163
bee4f91
4b19384
0e42d01
1c09c50
35219ca
0de2923
87fcaa6
8197c5f
bd55eed
182f20e
ed18b49
6adba4a
d50ba1a
834f883
8ca3912
b5f8967
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably produce a descriptive error where someone has to look to fix the situation. I could imagine that with the next subclass of Expr it might be easily forgotten and then this will only yield an obscure runtime error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uhm ... this is just copy-paste from
Expr::show
which i renamed toExpr::showAsAterm
i guess this means "end of input"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh, this sin't "end of input" but instead terminates the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.cppreference.com/w/cpp/utility/program/abort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was added in d4f0b0fc6c by @edolstra - maybe he knows : )
my second guess is: this means "empty input"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we could use a proper JSON serialization library instead? Nix already uses nlohmann JSON and has a home-grown JSON serialiser in
src/libutil/json.hh
.Could the interface could be
void ExprType::showAsJson(json& list)
instead and an example implementation could look like this:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define proper ... this is working, and its fast
i did not use nlohmann, cos it has a different internal datastructure, iirc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would make an interesting benchmark ... someone, anyone? ; )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... i would rather add a machine-readable json format, as i described in #4726 (comment)
benefit: json is smaller, can be parsed faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you care about performance, maybe JSON is not the best format you can aim for. And if you really care about performance, you should aim for a lazy event-driven API (the XML parser folks pioneered this and called it "SAX").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just merge my patch. : P
the whole point of my patch is to provide a FAST nix-to-json parser
and your "pretty" solution via some other high-level json printer is (probably) slower
"probably" = i did no benchmark, but its an educated guess
compare my 30 lines in src/libexpr/nixexpr-as-json.cc
(no need to print ascii, no need to validate input)
with 250 lines in nlohmann's
dump_escaped
https://github.com/nlohmann/json/blob/develop/include/nlohmann/detail/output/serializer.hpp#L238
https://github.com/nlohmann/json/blob/develop/include/nlohmann/detail/output/serializer.hpp#L382
how often does that happen? once in five years?
the only "special care" i can think of is keeping backward compatibility
by appending new types at end-of-file in src/libexpr/nixexpr-node-types.def
what do you mean?
the json spec is constant
when bugs appear, someone will fix them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utf8 test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protobuf
yepp, a stream parser, let me put this on my neverending todo list : D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im starting to understand your concern
so, what im currently doing:
escape bytes 0 to 92 = ascii control chars (\n \r \b ...), doublequotes, backslash
for example, null byte → \u0000
everything else (bytes 93 to 255) is simply passed through as is
does this work with unicode?
as for the json spec, it is allowed to have raw unicode in json strings
when unicode is used in json, it must be utf16, for example \u0001(nope, utf16 in javascript)can the escaping break valid unicode?
lets look at the valid unicode byte ranges
ascii bytes are from 0 to 127 = 7 bit
so the first unicode byte must be 128 to 255
what about the following unicode bytes?
https://lemire.me/blog/2018/05/09/how-quickly-can-you-check-that-a-string-is-valid-unicode-utf-8/
converted with hex2dec.py
so ... ALL unicode bytes are in the range from 128 to 255
and since i escape only bytes 0 to 92, this works : )
one rare edgecase, where this could break: non-unicode input, for example latin1 encoding.
some ascii control chars, like \1 are encoded as unicode \u0001,
so the result string can be mixed unicode and latin1. simple solution: blame the user.
either for throwing non-unicode strings at nix, or for throwing ascii control chars at nix.
(ascii control chars are worse than non-unicode strings)