libjson is a small C library and small codebase that packs an efficient parser and a configurable printer. libjson is covered by the LGPLv2 license, or at your option the LGPLv3 license.
Here's the feature of libjson:
Get the JSON data to the parser any way you want; by appending char by char, or string chunks, the input reading is completely left to the caller.
easy integration with any model by the means of a simple callback.
handcoded parser and efficient factorisation make the code smalls.
use efficient code, and small parsing tables to not do any extra work and remains as fast and efficient as possible.
tested through a small and precise testsuite.
callback only string of data and leave the actual representation of data to the caller
- Supports putting limits on nesting level. security against DoS over very deep data.
- Supports putting limits on data (string/int/float) size. security against DoS over very large data object.
Optionally support YAML/python comments and C comments.
Supports projects-specific allocation functions to integrate completely with projects
jsonlint utility provided with the library to verify, or reformat json stream. also useful as example on how to use the library.
the interruptible parser permits the user to incrementally parse data. each time new data is passed to the parser, the parser potentially emits call back to the caller to give a found value, and every time update its internal state machine. this state machine will returns an error when the data passed doesn't end up beeing a JSON compliant stream.
Because of the interruptibility of the parser, the caller is in charge of feeding data to the parser, which give the flexibility to the caller to channel the data to the parser in its own ways. for example, embedded stream of json in another stream, can dynamically and without extra allocation, be passed to the parser. this effectively decoupled the parser from the stream of data.
This also reduce the API related to parser's input to only one function, instead of specific functions to parse a string in memory, a C FILE handler, a unix file descriptor, etc.
There's no object model embedded in the library. This way the object model is left to the caller wants and needs. like a SAX API, the parser callback on opening and closing structures, and also for every data values.
The caller is free to represent a json object as a GLIB hashtable, an handmade chained list, etc. The same is true for any JSON types. This cut down the usual rewritting of the library object tree to the native object tree after parsing that takes memory and cpu time.
Float and int are passed to the caller directly as a string. Since JSON integer and float are not limited to a specific size, this is up to the caller to specify how to read the string. for example one implementation can choose to represent the json int as very big int (GMP), but some other implementation can choose 64 bits int and truncate data.
If the callback return a non null value, the parser will return failure, so that you can stop the parsing.
The design of the library is to emphasis security. the event based parser, the carefully secure crafted code, and the optional limits in place means that the parser can be put on the front line and will behave in front of DoS and huge data.
The parser supports maximum size in the data buffer for string/float/int and supports a maximum nesting level.
If the data limit is set to 0, the buffer will double when necessary.
If the nestimg limit is set to 0, the stack will double when necessary.
All string data are passed to the caller with an explicit size; NULL bytes can be handled properly.
The libjson API is very simple.
every functions returns a int that is either 0 for success, or a json_error
if different from 0.
all callback takes an optional void *
to be able to pass a pointer to anything to the callback function.
There's no assumption about how a file is supposed to be read, or how we suppose to allocate memory.
All libjson structure are fully available for the library user so that they can also be allocated on the function stack for pratical purpose.
The first thing you need to parse a document is a parsing context. Every parsing function take a context as a first argument, and this is to keep the parsing state machine and the event callback mechanism. It also contains optional user defined memory functions and the parser configuration (limits, comments allowed).
The following example initializes a new parser context:
- a NULL config which means the default parser config.
my_callback
a function that will be called each time there's a JSON event.my_callback_data
avoid *
value that will be passed tomy_callback
every times it's called.
json_parser parser;
if (json_parser_init(&parser, NULL, my_callback, my_callback_data)) {
fprintf(stderr, "something wrong happened during init\n");
}
Remember that's the parser need to allocate some data, so when the parser context is not used anymore, it needs to free with the appropriate function:
json_parser_free(&parser);
The only thing left is feeding data into the parser. This is done with the
json_parser_string
function which takes a string data and a length, and an
optional offset pointer. This function is completly incremental, which means
you don't have to parse all your document at once.
You're in charge of passing the data, from your file, your socket, your pipe, your in-memory string to the parsing function.
The following example take a in-memory string and pass it to json_parser_string
1 characters by 1 characters:
char my_json_string[] = "{ \"key\": 123 }";
for (i = 0; i < strlen(my_json_string); i += 1) {
ret = json_parser_string(&parser, my_json_string + i, 1, NULL);
if (ret) {
/* error happened : print a message or something */
break;
}
or 4 characters by 4 characters:
char my_json_string[] = "{ \"key\": 123 }";
for (i = 0; i < strlen(my_json_string); i += 4) {
ret = json_parser_string(&parser, my_json_string + i, 4, NULL);
if (ret) {
/* error happened : print a message or something */
break;
}
or from a file reading 1024-bytes blocks at the same time:
int fd, len, ret;
char block[1024];
fd = open(file, ...);
while ((len = read(fd, block, 1024)) > 0) {
ret = json_parser_string(&parser, block, len, NULL);
if (ret) {
/* error happened : print a message or something */
break;
}
}
Each time the function json_parser_string
function is called with data, the
callback registered at context init time might be called.
There's a callback each time the parser has processed a JSON atom.
a JSON atom can be:
JSON_OBJECT_BEGIN
: opening a new objectJSON_OBJECT_END
: current object is closingJSON_ARRAY_BEGIN
: opening a new arrayJSON_ARRAY_END
: current array is closingJSON_INT
: a JSON int has been parsedJSON_FLOAT
: a JSON float has been parsedJSON_STRING
: a JSON string has been parsedJSON_KEY
: a JSON key has been parsedJSON_TRUE
: a JSON true constant has been parsedJSON_FALSE
: a JSON false constant has been parsedJSON_NULL
: a JSON null constant has been parsed
A callback prototype looks like the following:
userdata
: is the callback object registered at context init time.type
: is the type of the atom parsed.data
: is a optional string that contained the data associated with this atom.length
: is the length in byte of the data associated with the atom.
int my_callback(void *userdata, int type, const char *data, uint32_t length)
the following example is a full callback function that just print the atom received. as a callback object it can support an optional FILE * to print to, otherwise it will use stdout.
int my_callback(void *userdata, int type, const char *data, uint32_t length)
{
FILE *output = (userdata) ? userdata : stdout;
switch (type) {
case JSON_OBJECT_BEGIN:
case JSON_ARRAY_BEGIN:
fprintf(output, "entering %s\n", (type == JSON_ARRAY_BEGIN) ? "array" : "object");
break;
case JSON_OBJECT_END:
case JSON_ARRAY_END:
fprintf(output, "leaving %s\n", (type == JSON_ARRAY_END) ? "array" : "object");
break;
case JSON_KEY:
case JSON_STRING:
case JSON_INT:
case JSON_FLOAT:
fprintf(output, "value %*s\n", length, data);
break;
case JSON_NULL:
fprintf(output, "constant null\n"); break;
case JSON_TRUE:
fprintf(output, "constant true\n"); break;
case JSON_FALSE:
fprintf(output, "constant false\n"); break;
}
}
Parser configuration can be set when initializing the parsing context. this is done by
passing a non-NULL pointer to a valid json_config
.
The configuration structure support 7 differents variables, which can be group in 3 categories:
- user defined memory functions.
- security.
- optional extensions.
The library user can choose to redefine its own allocation functions (realloc
and calloc), in this case the parser will allocate using those functions. this
is controlled by user_calloc
and user_realloc
.
there's 2 security settings available: max_nesting
and max_data
.
max_data
control the size of data allowed, which control the maximum size of
int, string, float, and keys in bytes. this is directly connected to the
size of the data buffer. if set to 0, the buffer will try to grow each time
it's necessary.
max_nesting
controls the number of nested structures allowed by the parser.
each new nesting increases the parser memory use by 1 byte (for 4096 nested
structures, you need 4K of memory).
For security purpose, if the parser is directly connected to a network stream, setting those variables, is strongly recommended.
You can enable C comments and enable YAML comments arbitrarly from each other.
allow_c_comment
will enable C comment: starting at /*, and finishing at */
allow_yaml_comment
will enable YAML/python comment: starting at # and finishing
at the end of line.
comments cannot be nested.
{
# this is a YAML comment
"key": /* this is a C comment */ 123,
}
first, the printing context is an object that is passed to all printing functions. The main part of the printing context is the printing callback. There's also some state about indentation level and pretty printing state.
The following example initializes a new printing context:
- a NULL config which means the default parser config.
my_callback
a function that will be called each time there's a JSON event.my_callback_data
avoid *
value that will be passed tomy_callback
every times it's called.
json_printer print;
if (json_print_init(&print, my_callback, my_callback_data)) {
fprintf(stderr, "something wrong happened during init\n");
}
The printing context need to free after use using the following:
json_print_free(&print);
You can choose between pretty printing and raw printing.
raw printing is usually associated for transfering or storing JSON data: it's
terse for common human usage. the function json_print_raw
is associated with
raw printing
pretty printing is usually targeted for human processing and contains spaces
and line return. the function json_print_pretty
is associated with pretty
printing
Both functions can be interchanged as they have the same prototype.
the following example print an object with one key "a" with a value of 2505:
json_print_pretty(&print, JSON_OBJECT_BEGIN, NULL, 0);
json_print_pretty(&print, JSON_KEY, "a", 1);
json_print_pretty(&print, JSON_INT, "2505", 4);
json_print_pretty(&print, JSON_OBJECT_END, NULL, 0);
The printing API only transform a JSON atom into the equivalent string; the callback is called one or multiples time at each atom, so that the user can choose what to do with the marshalled data.
The API is not in charged of allocating a in-memory string or store the marshalled data in a file, however doing anything similar is very easy.
The following example printing callback would write the previous structure to disk. The file descriptor is passed through the callback object as void *, and need to be valid.
int my_printer_callback(void *userdata, const char *s, uint32_t length)
{
int fd = (int) userdata;
write(fd, s, lenght);
return 0;
}
For conveniance, there's a json_print_args
API function that takes multiples JSON atoms
and repeatly call the associated printing function (raw or pretty). the
function parameters need to be terminated by a -1 atom to indicate the end of arguments.
Here's an example on how to print the same object as previous example but with one call:
json_print_args(&print, json_print_pretty,
JSON_OBJECT_BEGIN,
JSON_KEY, "a", 1,
JSON_INT, "2505", 4,
JSON_OBJECT_END,
-1);
For convenience, there's some helpers function that permits constructing JSON DOM tree. they are built on top of the event based parsing API described earlier.
a DOM parsing context hold the state of the DOM building. it also hold 3 differents callbacks to create the JSON values in the tree.
the first callback, create_structure
, is called when a new object or array
is suppose to be created. the callback can choose the representation of the data, but
need to returns a single void * to represent this new structure.
the second callback, create_data
, is called for each value that is not
an object or an array. it need returning a void * to represent this value.
the last callback, append
, is called each time we need to append a value to an
existing structure (array/object). if called with a non-NULL key, then the
first argument of the function represent an object, otherwise an array.
the following example hook into a hypothetical framework with array and object special method:
void *tree_create_structure(int nesting, int is_object)
{
if (is_object)
return new_object();
else
return new_array();
}
void *tree_create_data(int type, const char *data, uint32_t length)
{
switch (type) {
case JSON_STRING:
case JSON_INT:
case JSON_FLOAT:
return new_json_value(type, data, length);
case JSON_NULL:
case JSON_TRUE:
case JSON_FALSE:
return new_json_const(type);
}
}
int tree_append(void *structure, char *key, uint32_t key_length, void *obj)
{
if (key != NULL) {
my_object *object = structure;
append_object(object, key, key_length, obj);
} else {
my_array *array = structure;
append_array(array, obj);
}
}
the following example hooks the DOM helper parser into the event parser:
json_parser_dom helper;
json_parser parser;
json_parser_dom_init(&helper, create_structure, create_data, append);
json_parser_init(&parser, json_parser_dom_callback, &helper);
JSONlint is a small utility using libjson. it's able to verify and reformat JSON file.
the following will reformat input.json file and create a file output.json:
jsonlint --format input.json -o output.json