-
Notifications
You must be signed in to change notification settings - Fork 57
Home
JSON.hpack is a lossless, cross language, performances focused, data set compressor.
It is able to reduce up to 70% number of characters used to represent a generic homogeneous collection.
One of the most classic usage of JSON format is to reproduce a database query result and to send it back to the client.
For example, lets say we have an employee table with 4 fields: name, age, gender, skilled.
The classic representation of a query like
SELECT * FROM db_company.employee ------------------------------------ | name | age | gender | skilled | ------------------------------------ | Andrea | 31 | Male | true | ------------------------------------ | Eva | 27 | Female | true | ------------------------------------ | Daniele | 26 | Male | false | ------------------------------------
will generate in almost every language an array of objects, represented in this way via XML:
<result> <item> <name type="string">Andrea</name> <age type="number">31</age> <gender type="enum">Male</gender> <skilled type="boolean">true</skilled> </item> <item> <name type="string">Eva</name> <age type="number">27</age> <gender type="enum">Female</gender> <skilled type="boolean">true</skilled> </item> <item> <name type="string">Daniele</name> <age type="number">26</age> <gender type="enum">Male</gender> <skilled type="boolean">false</skilled> </item> </result>
As everybody knows
JSON is The Fat-Free Alternative to XML
So its natural representation of above query result will be this:
[ { "name":"Andrea", "age":31, "gender":"Male", "skilled":true }, { "name":"Eva", "age":27, "gender":"Female", "skilled":true }, { "name":"Daniele", "age":26, "gender":"Male", "skilled":false } ]
Compared to XML, we have completely removed the open/close redundant tag, plus we do not have to specify the value type, if we do not want to loose this information.
But, there is still something redundant here, and it is the fact that every single object in the list will have the same number or properties.
We can then say that JSON property names are redundant.
The most common practice to serve documents, such XML, plain text or JSON, is the gz or deflate compressed output. Unfortunately, even if every browser has a built in zlib module to decompress strings on requests completed, JavaScript cannot use this feature to compress/decompress same strings. This limit is more “mono-directional” because thanks to JSON and gzipped outputs we can send to the client huge amount of data without compromising both bandwidth and response time.
A big limit is to manipulate received collection, and send back in “one shot” a consistent amount of data (the received collection itself, why not). Thanks to JSON.hpack we can send from client to server up to 70% less characters than a normal JSON post request.
The result is a faster interaction in both ways and, even if JavaScript or the server will spend few milliseconds to pack or unpack long collections, the total elapsed time between the sent action and the response, plus the total bandwidth used to both send and receive (think about mobile connections as well) will be less than ever.
For these reason, the most important thing is to have many server-side implementations as possible in order to be able to unpack collections sent via client or to understand that data and to manipulate it on the server without problems.
The unpack operation is indeed truly fast and simple to implement as well.
To send back data we can still use gzipped/deflated strings, specially because these compressor algorithms are both fast and bandwidth savers.
As summary, without gzip the generated JSON.hpack output could fall down from 70Kb to 26Kb while via gzip the difference will not be that consistent (repeated JSON property names are well compressed).
The main feature of JSON.hpack is to remove keys (property names) from the structure creating an header on index 0 with each property name. This header could be represented in this way:
["name","age","gender","skilled"]
Respecting the header order, every other element in the collection will simply have the value, rather than the key plus its value:
["Andrea",31,"Male",true],["Eva",27,"Female",true],["Daniele",26,"Male",false]
Above example is the compression level 0 of JSON.hpack so that the final result will be:
[["name","age","gender","skilled"],["Andrea",31,"Male",true],["Eva",27,"Female",true],["Daniele",26,"Male",false]]
It is possible to reduce even more the size of the JSON string assuming that in a result sets there will be duplicated entries, like true/false, names, roles, addresses, cities, countries, etc etc … ( numbers expluded )
The compression level 1 then converts every value into an enum list, and put created enum indexes as values.
To to this, the header list needs to contain the enum to evaluate just as next entry in the list itself.
["name",["Andrea","Eva","Daniele"],"age","gender",["Male","Female"],"skilled",[true,false]]
The reason numbers are excluded from this operation is that in my opinion it did not make that much sense to swap numbers (values) for numbers (indexes) so if you think I am wrong, let’s discuss about it :-)
As summary, JSON.hpack level one will produce this result:
[["name",["Andrea","Eva","Daniele"],"age","gender",["Male","Female"],"skilled",[true,false]],[0,31,0,0],[1,27,1,0],[2,26,0,1]]
After the header at index 0, we will have only indexes or native numbers (“age” field) but as you can see, this procedure could generate a bigger JSON string, so let’s move on.
If each object has a unique property value, as is in this case for both age, number, and name, string, the level 2 try to understand if the created enum list in the header is worthy.
For worthy I mean that if the length of the entire collection is the same of the enum, every index as value will be just one more characters to add to the final result.
Accordingly, the level 2 will generate a different header, preserving two worthy enum lists but removing the name one.
[["name","age","gender",["Male","Female"],"skilled",[true,false]],["Andrea",31,0,0],["Eva",27,1,0],["Daniele",26,0,1]]
Due to the poor, performanced focused, redundancy check, level 2 is still fast but results could not be that good/different from level 0 or 1.
So far, level 3 is the best option for JSON.hpack.
Level 3 avoid level 2 checks and perform directly an expensive optimization comparing strings length between the entire homogeneous collection with the enum in the header and indexes as values and the entire collection without the enum and with original values.
This kind of check is performed for every column, except for numeric one, if any.
The limit of this compression level is that length check is performed for each column but not for the full collection.
This means that level 3 will produce this output:
[["name","age","gender",["Male","Female"],"skilled"],["Andrea",31,0,true],["Eva",27,1,true],["Daniele",26,0,false]]
The only worthy enum list is then the gender one, true and false in the list plus 0,0,1 as indexes are not worthy:
["Male","Female",0,1,0] < ["Male","Female","Male"] (23 characters against 24) [true,false,0,0,1] > [true,true,false] (18 characters against 17)
This level does not perform anything but all compressions, returning the best option between level 0, 1, 2, and 3.
For example, with used example collection the best option is the level 0 so level 4 will return:
[["name","age","gender","skilled"],["Andrea",31,"Male",true],["Eva",27,"Female",true],["Daniele",26,"Male",false]]
This level is based on public method hbest(homogeneousCollection);
which returns the best level to use.
The important thing to remember is that once we know which level is the best one for a specific collection, we can assume that level will be worthy for every kind of result based on that list.
In few words, hbest method should be used only during development to understand which level is the best one while for production we can simply specify manually that option.
Every method in JSON.hpack is public and static. These methods are:
- hpack( Array[, Int[0-4]] ):HArray, converts an homogeneous collection into an hpack valid collection
- hunpack( HArray ):Array, converts an hpack valid collection into original homogenous collection. Please note that it does not matter which compression level we used to generate the HArray because the Array regeneration is compression independent (simple fast logic)
- hbest( Array ):Int[0-4], returns the ideal compression level for specified homogeneous collection
If the global JSON is present, the JavaScript version will simply add public methods as described before. If JSON object is not present, this file will create an empty object with API methods.
This means that in any case JSON.hpack is compatible with native JSON, if present, or with famous Douglas Crockford JSON Object.
var obj = JSON.hunpack(eval("(" + xhrGet.responseText + ")"); obj[1].name = "Madonna"; xhrPut.send("collection=" + encodeURIComponent( JSON.hpack(obj, 3) ));
Rather than a class or a public object, the PHP version respects its language nature, adding 4 functions in the global scope:
json_hpack, json_hunpack, json_hbest, and a “private use only” json_hunpack_createRow.
Please note that current PHP version works only with database fetched objects, rather than associative arrays.
... if(isset($_POST['collection'])) $_SESSION['clientCollection'] = json_hunpack(json_decode($_POST['collection'])); $collection = array(); while($row = mysql_fetch_object($query)) $collection[] = $row; echo json_encode(json_hpack($collection, 3));
This file requires a reference to the System.Web.Extensions Assembly due to include the System.Web.Script.Serialization namespace.
You can find the latest System.Web.Extensions assembly here: ASP.NET AJAX
In this case the class is public static (except for a property which makes static declaration useless).
In current version each object of the homogeneous collection is a Dictionary<string,object> so the homogeneous collection should be a List<Dictionary<string, object>>.
These types are in my opinion the most suitable for JSON representation but I may be wrong so please do not hesitate to suggest me a better type.
... JavaScriptSerializer json = new JavaScriptSerializer(); if(Request["collection"] != null) List<Dictionary<string, object>> collection = json.Deserialize<List<Dictionary<string, object>>>(Request["collection"]); Response.Write(json.Serialize(JSONH.pack(colelction, 3)));
I am planning to release a Python version as soon as possible and I am looking for help in order to create a Ruby, Java, Perl, and other commong server-side languages version, included a possible PHP extension in C to make JSON.hpack natively fast.
Please contact me if you would like to add your favorite language version, thank you.