forked from indeyets/syck
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.EXT
444 lines (307 loc) · 13.5 KB
/
README.EXT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
$Id$
This is the documentation for libsyck and describes how to extend it.
= Overview =
Syck is designed to take a YAML stream and a symbol table and move
data between the two. Your job is to simply provide callback functions which
understand the symbol table you are keeping.
Syck also includes a simple symbol table implementation.
== About the Source ==
The Syck distribution is laid out as follows:
lib/ libsyck source (core API)
bytecode.re lexer for YAML bytecode (re2c)
emitter.c emitter functions
gram.y grammar for YAML documents (bison)
handler.c internal handlers which glue the lexer and grammar
implicit.re lexer for builtin YAML types (re2c)
node.c node allocation and access
syck.c parser funcs, central funcs
syck.h libsyck definitions
syck_st.c symbol table functions
syck_st.h symbol table definitions
token.re lexer for YAML plaintext (re2c)
yaml2byte.c simple bytecode emitter
ext/ ruby, python, php, cocoa extensions
tests/ unit tests for libsyck
YTS.c.rb generates YAML Testing Suite unit test
(use: ruby YTS.c.rb > YTS.c)
Basic.c allocation and buffering tests
Parse.c parser sanity
Emit.c emitter sanity
== Using SyckNodes ==
The SyckNode is the structure which YAML data is loaded into
while parsing. It's also a good structure to use while emitting,
however you may choose to emit directly from your native types
if your extension is very small.
SyckNodes are designed to be used in conjunction with a symbol
table. More on that in a moment. For now, think of a symbol
table as a library which stores nodes, assigning each node a
unique identifier.
This identifier is called the SYMID in Syck. Nodes refer to
each other by SYMIDs, rather than pointers. This way, the
nodes can be free'd as the parser goes.
To be honest, SYMIDs are used because this is the way Ruby
works. And this technique means Syck can use Ruby's symbol
table directly. But the included symbol table is lightweight,
solves the problem of keeping too much data in memory, and
simply pairs SYMIDs with your native object type (such as
PyObject pointers.)
Three kinds of SyckNodes are available:
1. scalar nodes (syck_str_kind):
These nodes store a string, a length for the string
and a style (indicating the format used in the YAML
document).
2. sequence nodes (syck_seq_kind):
Sequences are YAML's array or list type.
These nodes store a list of items, which allocation
is handled by syck functions.
3. mapping nodes (syck_map_kind):
Mappings are YAML's dictionary or hashtable type.
These nodes store a list of pairs, which allocation
is handled by syck functions.
The syck_kind_tag enum specifies the above enumerations,
which can be tested against the SyckNode.kind field.
PLEASE leave the SyckNode.shortcut field alone!! It's
used by the parser to workaround parser ambiguities!!
=== Node API ===
SyckNode *
syck_alloc_str()
syck_alloc_seq()
syck_alloc_str()
Allocates a node of a given type and initializes its
internal union to emptiness. When left as-is, these
nodes operate as a valid empty string, empty sequence
and empty map.
Remember that the node's id (SYMID) isn't set by the
allocation functions OR any other node functions herein.
It's up to your handler function to do that.
void
syck_free_node( SyckNode *n )
While the Syck parser will free nodes it creates, use
this to free your own nodes. This function will free
all of its internals, its type_id and its anchor. If
you don't need those members free, please be sure they
are set to NULL.
SyckNode *
syck_new_str( char *str, enum scalar_style style )
syck_new_str2( char *str, long len, enum scalar_style style )
Creates scalar nodes from C strings. The first function
will call strlen() to determine length.
void
syck_replace_str( SyckNode *n, char *str, enum scalar_style style )
syck_replace_str2( SyckNode *n, char *str, long len, enum scalar_style style )
Replaces the string content of a node `n', while keeping
the node's type_id, anchor and id.
char *
syck_str_read( SyckNode *n )
Returns a pointer to the null-terminated string inside scalar node
`n'. Normally, you might just want to use:
char *ptr = n->data.str->ptr
long len = n->data.str->len
SyckNode *
syck_new_map( SYMID key, SYMID value )
Allocates a new map with an initial pair of nodes.
void
syck_map_empty( SyckNode *n )
Empties the set of pairs for a mapping node.
void
syck_map_add( SyckNode *n, SYMID key, SYMID value )
Pushes a key-value pair on the mapping. While the ordering
of pairs DOES affect the ordering of pairs on output, loaded
nodes are deliberately out of order (since YAML mappings do
not preserve ordering.)
See YAML's builtin !omap type for ordering in mapping nodes.
SYMID
syck_map_read( SyckNode *n, enum map_part, long index )
Loads a specific key or value from position `index' within
a mapping node. Great for iteration:
for ( i = 0; i < syck_map_count( n ); i++ ) {
SYMID key = sym_map_read( n, map_key, i );
SYMID val = sym_map_read( n, map_value, i );
}
void
syck_map_assign( SyckNode *n, enum map_part, long index, SYMID id )
Replaces a specific key or value at position `index' within
a mapping node. Useful for replacement only, will not allocate
more room when assigned beyond the end of the pair list.
long
syck_map_count( SyckNode *n )
Returns a count of the pairs contained by the mapping node.
void
syck_map_update( SyckNode *n, SyckNode *n2 )
Combines all pairs from mapping node `n2' into mapping node
`n'.
SyckNode *
syck_new_seq( SYMID val )
Allocates a new seq with an entry `val'.
void
syck_seq_empty( SyckNode *n )
Empties a sequence node `n'.
void
syck_seq_add( SyckNode *n, SYMID val )
Pushes a new item `val' onto the end of the sequence.
void
syck_seq_assign( SyckNode *n, long index, SYMID val )
Replaces the item at position `index' in the sequence
node with item `val'. Useful for replacement only, will not allocate
more room when assigned beyond the end of the pair list.
SYMID
syck_seq_read( SyckNode *n, long index )
Reads the item at position `index' in the sequence node.
Again, for iteration:
for ( i = 0; i < syck_seq_count( n ); i++ ) {
SYMID val = sym_seq_read( n, i );
}
long
syck_seq_count( SyckNode *n )
Returns a count of items contained by sequence node `n'.
== YAML Parser ==
Syck's YAML parser is extremely simple. After setting up a
SyckParser struct, along with callback functions for loading
node data, use syck_parse() to start reading data. Since
syck_parse() only reads single documents, the stream can be
managed by calling syck_parse() repeatedly for an IO source.
The parser has four callbacks: one for reading from the IO
source, one for handling errors that show up, one for
handling nodes as they come in, one for handling bad
anchors in the document. Nodes are loaded in the order they
appear in the YAML document, however nested nodes are loaded
before their parent.
=== How to Write a Node Handler ===
Inside the node handler, the normal process should be:
1. Convert the SyckNode data to a structure meaningful
to your application.
2. Check for the bad anchor caveat described in the
next section.
3. Add the new structure to the symbol table attached
to the parser. Found at parser->syms.
4. Return the SYMID reserved in the symbol table.
=== Nodes and Memory Allocation ===
One thing about SyckNodes passed into your handler:
Syck WILL free the node once your handler is done with it.
The node is temporary. So, if you plan on keeping a node
around, you'll need to make yourself a new copy.
And you'll probably need to reassign all the items
in a sequence and pairs in a map. You can do this
with syck_seq_assign() and syck_map_assign(). But, before
you do that, you might consider using your own node structure
that fits your application better.
=== A Note About Anchors in Parsing ===
YAML anchors can be recursive. This means deeper alias nodes
can be loaded before the anchor. This is the trickiest part
of the loading process.
Assuming this YAML document:
--- &a [*a]
The loading process is:
1. Load alias *a by calling parser->bad_anchor_handler, which
reserves a SYMID in the symbol table.
2. The `a' anchor is added to Syck's own anchor table,
referencing the SYMID above.
3. When the anchor &a is found, the SyckNode created is
given the SYMID of the bad anchor node above. (Usually
nodes created at this stage have the `id' blank.)
4. The parser->handler function is called with that node.
Check for node->id in the handler and overwrite the
bad anchor node with the new node.
=== Parser API ===
See <syck.h> for layouts of SyckParser and SyckNode.
SyckParser *
syck_new_parser()
Creates a new Syck parser.
void
syck_free_parser( SyckParser *p )
Frees the parser, as well as associated symbol tables
and buffers.
void
syck_parser_implicit_typing( SyckParser *p, int on )
Toggles implicit typing of builtin YAML types. If
this is passed a zero, YAML builtin types will be
ignored (!int, !float, etc.) The default is 1.
void
syck_parser_taguri_expansion( SyckParser *p, int on )
Toggles expansion of types in full taguri. This
defaults to 1 and is recommended to stay as 1.
Turning this off removes a layer of abstraction
that will cause incompatibilities between YAML
documents of differing versions.
void
syck_parser_handler( SyckParser *p, SyckNodeHandler h )
Assign a callback function as a node handler. The
SyckNodeHandler signature looks like this:
SYMID node_handler( SyckParser *p, SyckNode *n )
void
syck_parser_error_handler( SyckParser *p, SyckErrorHandler h )
Assign a callback function as an error handler. The
SyckErrorHandler signature looks like this:
void error_handler( SyckParser *p, char *str )
void
syck_parser_bad_anchor_handler( SyckParser *p, SyckBadAnchorHandler h )
Assign a callback function as a bad anchor handler.
The SyckBadAnchorHandler signature looks like this:
SyckNode *bad_anchor_handler( SyckParser *p, char *anchor )
void
syck_parser_file( SyckParser *p, FILE *f, SyckIoFileRead r )
Assigns a FILE pointer as an IO source and a callback function
which handles buffering of that IO source.
The SyckIoFileRead signature looks like this:
long SyckIoFileRead( char *buf, SyckIoFile *file, long max_size, long skip );
Syck comes with a default FILE handler named `syck_io_file_read'. You
can assign this default handler explicitly or by simply passing in NULL
as the `r' parameter.
void
syck_parser_str( SyckParser *p, char *ptr, long len, SyckIoStrRead r )
Assigns a string as the IO source with a callback function `r'
which handles buffering of the string.
The SyckIoStrRead signature looks like this:
long SyckIoFileRead( char *buf, SyckIoStr *str, long max_size, long skip );
Syck comes with a default string handler named `syck_io_str_read'. You
can assign this default handler explicitly or by simply passing in NULL
as the `r' parameter.
void
syck_parser_str_auto( SyckParser *p, char *ptr, SyckIoStrRead r )
Same as the above, but uses strlen() to determine string size.
SYMID
syck_parse( SyckParser *p )
Parses a single document from the YAML stream, returning the SYMID for
the root node.
== YAML Emitter ==
Since the YAML 0.50 release, Syck has featured a new emitter API. The idea
here is to let Syck figure out shortcuts that will clean up output, detect
builtin YAML types and -- especially -- determine the best way to format
outgoing strings.
The trick with the emitter is to learn its functions and let it do its
job. If you don't like the formatting Syck is producing, please get in
contact the author and pitch your ideas!!
Like the YAML parser, the emitter has a couple of callbacks: namely,
one for IO output and one for handling nodes. Nodes aren't necessarily
SyckNodes. Since we're ultimately worried about creating a string, SyckNodes
become sort of unnecessary.
=== The Emitter Process ===
1. Traverse the structure you will be emitting, registering all nodes
with the emitter using syck_emitter_mark_node(). This step will
determine anchors and aliases in advance.
2. Call syck_emit() to begin emitting the root node.
3. Within your emitter handler, use the syck_emit_* convenience methods
to build the document.
4. Call syck_emit_flush() to end the document and push the remaining
document to the IO stream. Or continue to add documents to the output
stream with syck_emit().
=== Emitter API ===
See <syck.h> for the layout of SyckEmitter.
SyckEmitter *
syck_new_emitter()
Creates a new Syck emitter.
SYMID
syck_emitter_mark_node( SyckEmitter *e, st_data_t node )
Adds an outgoing node to the symbol table, allocating an anchor
for it if it has repeated in the document and scanning the type
tag for auto-shortcut.
void
syck_output_handler( SyckEmitter *e, SyckOutputHandler out )
Assigns a callback as the output handler.
void *out_handler( SyckEmitter *e, char * ptr, long len );
Receives the emitter object, pointer to the buffer and a count
of bytes which should be read from the buffer.
void
syck_emitter_handler( SyckEmitter *e, SyckEmitterHandler
void
syck_free_emitter