-
Notifications
You must be signed in to change notification settings - Fork 0
anmcn/chutney
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
CHUTNEY ======= Chutney is a portable C library implementing a subset of the Python "pickle" serialisation protocol, and a Python binding for the library. Chutney supports serialising the following types: None (null) int float (C double) str (8 bit clean strings) utf-8 (UTF-8 encoded strings) tuple (list) dict (map) instance (a named dictionary, essentially) The code implementing the chutney library is in the "chutney" subdirectory. A Python binding to the library, "chutney.c" is in the top level directory, as are Python-level unit tests (which exercise both the Python binding and the underlying library). The protocol used is a subset of pickle protocol 1 and 2. The parser probably will not parse Python generated pickles except in simple cases, but the Python pickle/cPickle modules should parse the pickles we generate (and it should be possible to use pickletools.dis on generated chutneys for debugging purposes). Python chutney binding ====================== The Python binding for chutney currently presents only two methods - "dumps" and "loads", as well a base error class ChutneyError, and two specific error classes, UnpickleableError and UnpicklingError. The chutney Python API attempts to be similar to the pickle API, however there are some important differences: * only the specific types mentioned above are supported - other objects will generate an UnpickleableError. * memoising is not supported (at this time). Consequently, reference cycles are not detected, and multiple references to the same structure will result in the structure being saved in it's entirety multiple times. * modules are NOT imported when unpickling instances - they must already be in sys.modules. * python lists are sent as tuples to keep things simple. * when instantiating objects, the class __init__ is NOT called, and the attribute dictionary is added directly to the instance __dict__. * __getstate__, __setstate__, __reduce__, __reduce_ex__ and copy_reg are NOT supported, nor are slotted classes. Only the instance __dict__ is saved. These changes were motivated either by the desire to keep the chutney library simple (no memoising, limited type support) or to make the unserialisation more secure when dealing with data from potentially untrusted sources (no implicit import, no __setstate__, __init__ or __setattr__). Chutney Library =============== When reading the following API descriptions, chutney/chutney.h contains the public interface definitions, and should be read in conjunction with this documentation. The source of the Python chutney binding should also be consulted as an example of using the library. The library lives in the "chutney" subdirectory. No defines or platform detection is required (the library detects the floating point format at run-time, however). No provision has been made to build a static or dynamic link library from the source as this process is highly platform and project specific. The chutney API reflects the pickle state machine - reading the pickle documentation (in the Python pickletools module source) will aid understanding the chutney API. The pickle format resembles a simple stack-based virtual machine, with the last item on the stack forming the pickle value (so if you have more than one simple value to pass, they should be contained within a tuple, dict or instance). Generating a "chutney" ---------------------- Allocate a chutney_dump_state structure, then initialise it with chutney_dump_init, passing a write function and write context. The write function should accept the write context, a character pointer to the data to be written (which might contain nulls) and a count of data bytes. For simple objects (null, bool, int, float, string and utf8), simply call the appropriate chutney_save_XXX method, passing the state object and value (where applicable). See the prototypes in chutney/chutney.h for details. When dumping container objects, multiple API calls are required: To save a tuple: chutney_save_mark() save objects in the tuple chutney_save_tuple() To save a dictionary: chutney_save_empty_dict() chutney_save_mark() save up to CHUTNEY_BATCHSIZE key and value objects in pairs chutney_save_setitems If you have more than CHUTNEY_BATCHSIZE items to save, repeat the MARK, CHUTNEY_BATCHSIZE keys and values, SETITEMS as many times as is required. To save an instance: chutney_save_mark() chutney_save_global() passing module name and class name chutney_save_obj() save instance dictionary chutney_save_build() When complete, chutney_save_stop() must be called. chutney_dump_dealloc() should then be called to release any storage referenced by the state object (but this does not deallocate the state object itself). If any of the chutney_save_XXX methods return < 0, an error has occurred, and further method calls will have undefined results. Loading a "chutney" ------------------- To load a chutney, allocate a chutney_load_state structure, then initialise it with chutney_load_init, passing a chutney_load_callbacks structure. The chutney_load_callbacks structure contains pointers to functions the chutney parser can call to allocate, deallocate and manipulate application objects. As data becomes available, the "chutney_load" method should then be called. Note that chutney_load updates the passed data and length arguments to reflect the number of bytes consumed in parsing. "chutney_load" returns a chutney_status enum indicating the phase of the state machine: CHUTNEY_OKAY The chutney/pickle has been fully parsed and can be retrieved with chutney_load_result(). The data and length passed to chutney_load() are updated to point to any data remaining in the passed buffer. CHUTNEY_CONTINUE Parsing is not yet complete - more data is required. CHUTNEY_NOMEM A memory allocation has failed. CHUTNEY_PARSE_ERR CHUTNEY_STACK_ERR CHUTNEY_OPCODE_ERR CHUTNEY_NOMARK_ERR Flags errors in the structure of the pickle being parsed. CHUTNEY_CALLBACK_ERR A callback has flagged an error. The parser will call the callbacks as it finds objects in the data stream. The make_XXX callbacks should return a pointer to an opaque object, other callbacks typically return 0 to indicate success or -1 to indicate failure. The parser assumes responsibility for the opaque objects returned from the make_XXX callbacks, and will ultimately either call the "dealloc" callback with the pointer, pass it to one of the collection manipulation callbacks (make_tuple, dict_setitems), or return it as the load result from chutney_load_result(). In all these cases, when the object is passed back, the user is again responsible for deallocating the storage (or adjusting reference counts, etc), even if the container object cannot be created. The callback methods closely reflect the saving methods, although there are a few differences for the sake of convenience: * dealloc - called to deallocate an opaque application pointer representing an object created by one of the allocating callbacks. * make_null - called to allocate a NONE (null) object, no arguments are passed. * make_bool - called to allocate a boolean object, argument is a C int (0 or 1). * make_int - called to allocate an integer object, argument is a C int. * make_float - called to allocate a floating point object, argument is a C double. * make_string - called to allocate a string object, arguments are a const char pointer and a long count (strings are 8 bit clean, and can contain \0). * make_unicode - called to allocate a unicode object, arguments are a UTF-8 encoded char pointer and a long count. * make_tuple - called to allocate a tuple (or other ordered container), arguments are a void * array, and a count of entries in the array. The user assumes responsibility for all the objects in the array for the purposes of garbage collection, etc. * make_empty_dict - called to allocate an empty dictionary (or mapping), no arguments are passed. * dict_setitems - called to add items to a dictionary previously allocated by make_empty_dict, arguments are the dict, a void * array of key and value pairs to be added, and a count of keys and values (items * 2). The user assumes responsibility for the keys and values (but not the dict). * get_global - called to get a reference to a "global" object, arguments are const char pointers to the module name, and the name of the global object within that module. The parser does not care what this object represents - the returned opaque value is typically passed to other callbacks, such as make_object. * make_object - called to allocate an object, argument is an opaque pointer to a "global" object (typically the class). * object_build - called to add attributes to an object previously allocated via make_object, arguments are the object and a dictionary of attributes. The user assumes responsibility for the passed dictionary, but not the object. After successfully or unsuccessfully parsing a pickle, chutney_load_dealloc should be called to deallocate any storage referenced by chutney_load_state (note, however, that it does not deallocate the chutney_load_state structure itself). EOF
About
Chutney is a portable C library implementing a subset of the Python "pickle" serialisation protocol, and a Python binding for the library.
Resources
Stars
Watchers
Forks
Packages 0
No packages published