-
Notifications
You must be signed in to change notification settings - Fork 8
/
tutorial.txt
909 lines (668 loc) · 31.6 KB
/
tutorial.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
========================
eZ Components - Document
========================
.. contents:: Table of Contents
:depth: 3
Introduction
============
The document component offers transformations between different semantic markup
languages, like:
- `ReStructured text`__
- `XHTML`__
- `Docbook`__
- `eZ Publish XML markup`__
- Wiki markup languages, like: Creole__, Dokuwiki__ and Confluence__
- `Open Document Text`__ as used by `OpenOffice.org`__ and other office suites
Like shown in figure 1, each format supports conversions from and to docbook
as a central intermediate format and may implement additional shortcuts for
conversions from and to other formats. Not each format can express the same
semantics, so there may be some information lost, which is `documented in a
dedicated document`__.
.. figure:: img/document-architecture.png
:alt: Conversion architecture in document component
Figure 1: Conversion architecture in document component
There are central handler classes for each markup language, which follow a
common conversion interface ezcDocument and all implement the methods
getAsDocbook() and createFromDocbook().
Additionally the document component can render documents in the following
output formats. Those formats cannot be read, but just generated:
- PDF
__ http://docutils.sourceforge.net/rst.html
__ http://www.w3.org/TR/xhtml1/
__ http://www.docbook.org/
__ Document_conversion.html
__ http://doc.ez.no/eZ-Publish/Technical-manual/4.x/Reference/XML-tags
__ http://www.wikicreole.org/
__ http://www.dokuwiki.org/dokuwiki
__ http://confluence.atlassian.com/renderer/notationhelp.action?section=all
__ http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
__ http://www.openoffice.org/
Markup languages
================
The following markup languages are currently handled by the document
component.
ReStructured text
-----------------
`RsStructured Text`__ (RST) is a simple text based markup language, intended
to be easy to read and write by humans. Examples can be found in the
`documentation of RST`__.
The transformation of a simple RST document to docbook can be done just like
this:
.. include:: tutorial/00_00_convert_rst.php
:literal:
In line 3 the document is actually loaded and parsed into an internal abstract
syntax tree. In line 5 the internal structure is then transformed back to a
docbook document. In the last line the resulting document is returned as a
string, so that you can echo or store it.
__ http://docutils.sourceforge.net/rst.html
__ http://docutils.sourceforge.net/docs/user/rst/quickstart.html
Error handling
^^^^^^^^^^^^^^
By default each parsing or compiling error will be transformed into an
exception, so that you are noticed about those errors. The error reporting
settings can be modified like for all other document handlers::
<?php
$document = new ezcDocumentRst();
$document->options->errorReporting = E_PARSE | E_ERROR | E_WARNING;
$document->loadFile( '../tutorial.txt' );
$docbook = $document->getAsDocbook();
echo $docbook->save();
?>
Where the setting in line 3 causes, that only warnings, errors and fatal errors
are transformed to exceptions now, while the notices are only collected, but
ignored. This setting affects both, the parsing of the source document and the
compiling into the destination language.
Directives
^^^^^^^^^^
`RST directives`__ are elements in the RST documents with parameters, optional
named options and optional content. The document component implements a well
known subset of the `directives implemented in the docutils RST parser`__. You
may register custom directive handlers, or overwrite existing directive
handlers using your own implementation. A directive in RST markup with
parameters, options and content could look like::
My document
===========
The custom directive:
.. my_directive:: parameters
:option: value
Some indented text...
For such a directive you should register a handler on the RST document, like::
<?php
$document = new ezcDocumentRst();
$document->registerDirective( 'my_directive', 'myCustomDirective' );
$document->loadFile( $from );
$docbook = $document->getAsDocbook();
$xml = $docbook->save();
?>
The class myCustomDirective must extend the class ezcDocumentRstDirective, and
implement the method toDocbook(). For rendering you get access to the full AST,
the contents of the current directive and the base path, where the document
resist in the file system - which is necessary for accessing external files.
__ http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#directives
__ http://docutils.sourceforge.net/docs/ref/rst/directives.html
Directive example
`````````````````
A full example for a custom directive, where we want to embed real world
addresses into our RST document and maintain the semantics in the resulting
docbook, could look like::
Address example
===============
.. address:: John Doe
:street: Some Lane 42
We would possibly add more information, like the ZIP code, city and state, but
skip this to keep the code short. The implemented directive then would just
need to take these information and transform it into valid docbook XML using
the DOM extension.
.. include:: tutorial/00_01_address_directive.php
:literal:
The AST node, which should be rendered, is passed to the constructor of the
custom directive visitor and available in the class property $node. The
complete DOMDocument and the current DOMNode are passed to the method. In this
case we just create a `address node`__ with the optional child nodes street and
personname, depending on the existence of the respective values.
You can now render the RST document after you registered you custom directive
handler as shown above:
.. include:: tutorial/00_02_custom_directive.php
:literal:
The output will then look like::
<?xml version="1.0"?>
<article xmlns="http://docbook.org/ns/docbook">
<section id="address_example">
<sectioninfo/>
<title>Address example</title>
<address>
<personname> John Doe</personname>
<street> Some Lane 42</street>
</address>
</section>
</article>
__ http://docbook.org/tdg/en/html/address.html
XHTML rendering
^^^^^^^^^^^^^^^
For RST a conversion shortcut has been implemented, so that you don't need to
convert the RST to docbook and the docbook to XHTML. This saves conversion time
and enables you to prevent from information loss during multiple conversions::
<?php
$document = new ezcDocumentRst();
$document->loadFile( $from );
$xhtml = $document->getAsXhtml();
$xml = $xhtml->save();
?>
The default XHTML compiler generates complete XHTML documents, including header
and meta-data in the header. If you want to in-line the result, you may specify
another XHTML compiler, which just creates a XHTML block level element, which
can be embedded in your source code::
<?php
$document = new ezcDocumentRst();
$document->options->xhtmlVisitor = 'ezcDocumentRstXhtmlBodyVisitor';
$document->loadFile( $from );
$xhtml = $document->getAsXhtml();
$xml = $xhtml->save();
?>
You can of course also use the predefined and custom directives for XHTML
rendering. The directives used during XHTML generation also need to implement
the interface ezcDocumentRstXhtmlDirective.
Modification of XHTML rendering
```````````````````````````````
You can modify the generated output of the XHTML visitor by creating a custom
visitor for the RST AST. The easiest way probably is to extend from one of the
existing XHTML visitors and reusing it. For example you may want to fill the
type attribute in bullet lists, like known from HTML, which isn't valid XHTML,
though::
class myDocumentRstXhtmlVisitor extends ezcDocumentRstXhtmlVisitor
{
protected function visitBulletList( DOMNode $root, ezcDocumentRstNode $node )
{
$list = $this->document->createElement( 'ul' );
$root->appendChild( $list );
$listTypes = array(
'*' => 'circle',
'+' => 'disc',
'-' => 'square',
"\xe2\x80\xa2" => 'disc',
"\xe2\x80\xa3" => 'circle',
"\xe2\x81\x83" => 'square',
);
// Not allowed in XHTML strict
$list->setAttribute( 'type', $listTypes[$node->token->content] );
// Decoratre blockquote contents
foreach ( $node->nodes as $child )
{
$this->visitNode( $list, $child );
}
}
}
The structure, which is not enforced for visitors, but used in the docbook and
XHTML visitors, is to call special methods for each node type in the AST to
decorate the AST recursively. This method will be called for all bullet list
nodes in the AST which contain the actual list items. As the first parameter
the current position in the XHTML DOM tree is also provided to the method.
To create the XHTML we can now just create a new list node (<ul>) in the
current DOMNode, set the new attribute, and recursively decorate all
descendants using the general visitor dispatching method visitNode() for all
children in the AST. For the AST children being also rendered as children in
the XML tree, we pass the just created DOMNode (<ul>) as the new root node to
the visitNode() method.
After defining such a class, you could use the custom visitor like shown
above::
<?php
$document = new ezcDocumentRst();
$document->options->xhtmlVisitor = 'myDocumentRstXhtmlVisitor';
$document->loadFile( $from );
$xhtml = $document->getAsXhtml();
$xml = $xhtml->save();
?>
Now the lists in the generated XHTML will also the type attribute set.
Writing RST
^^^^^^^^^^^
Writing a RST document from an existing docbook document, or a
ezcDocumentDocbook object generated from some other source, is trivial:
.. include:: tutorial/00_03_write_rst.php
:literal:
For the conversion internally the ezcDocumentDocbookToRstConverter class is
used, which can also be called directly, like::
$converter = new ezcDocumentDocbookToRstConverter();
$rst = $converter->convert( $docbook );
Using this you can configure the converter to your wishes, or extend the
convert to handle yet unhandled docbook elements. The converter is, as usaul
configured using its option property, and the options are defined in the
ezcDocumentDocbookToRstConverterOptions class. There you may configure the
header underlines used, the bullet types or the line wrapping.
Extending RST writing
`````````````````````
As said before, not all existing docbook elements might already be handled by
the converter. But its handler based mechanism makes it easy to extend or
overwrite existing behaviour.
Similar to the example above we can convert the <address> docbook element back
to the address RST directive.
.. include:: tutorial/00_04_address_element.php
:literal:
The handler classes are assigned to XML elements in some namespace, "docbook"
in this case. It is registered in line 18 for the element "address". The class
itself has to extend from the ezcDocumentElementVisitorHandler class, which is
in this case already extended by ezcDocumentDocbookToRstBaseHandler, which
provides some convenience methods for RST creation, like renderDirective() used
in this example.
The handler is called, whenever the element, it has been registered for, occurs
in the docbook XML tree. In this case it has to append the generated RST part
for this element to the RST document - and may call the general conversion
handler again for its child elements. This example converts the above shown
docbook XML back to::
.. _address_example:
===============
Address example
===============
.. address::
John Doe
Some Lane 42
Which ignores any special address sub elements for the simplicity of the
example. For more examples on element handlers check the existing
implementations.
XHTML
-----
Converting XHTML or HTML to a document markup language is a non trivial task,
because XHTML elements are often used for layout, ignoring the actual semantics
of the element. Therefore the document component allows to stack a set of
filters, which each performs a specific conversion task. The default filter
stack may work fine, but you may want to also implement custom filters
depending on the contents of the filtered website, or to cover additional
sources of meta data information, like RDF, Microformats or similar.
The available filters are:
- ezcDocumentXhtmlElementFilter
This filter just maintains the common semantics of XHTML elements by
converting them to their docbook equivalents. It ignores common class names.
This filter is the most basic and you probably want to always add this one to
the filter stack.
- ezcDocumentXhtmlXpathFilter
The XPath filter takes a XPath expression to locate the root of the document
contents. It makes no sense to use this one together with the content locator
filter. This is a more static, but also more precise way to tell the
converter where to find the actual contents.
- ezcDocumentXhtmlMetadataFilter
This filter extracts common meta data from the XHTML head, and converts it
into docbook section info elements.
- ezcDocumentXhtmlTablesFilter
HTML tables are especially often used for layout markup. This filter takes a
threshold, and if the table text factor drops below this threshold the table
is ignored. The same is true for stacked tables.
- ezcDocumentXhtmlContentLocatorFilter
The content locator filter tries to find the actual article in the markup of
a website, ignoring the surrounding layout markup. This seems to work well
for example for common news sites.
By default just the element and meta data filters are used. So the conversion
of a common website, like the `introduction article`__ from ezcomponents.org,
results in a docbook document containing all lists for the navigation, etc..
.. include:: tutorial/01_00_read_html.php
:literal:
So let's additionally use the XPath filter to pass the location of the actual
content to the conversion:
.. include:: tutorial/01_01_read_html_filtered.php
:literal:
With this additional filter, the contents are correctly found and converted
properly.
__ http://ezcomponents.org/introduction
Writing XHTML
^^^^^^^^^^^^^
Writing XHTML from docbook is very similar to the approach used for writing
RST: It the same handler based mechanism, so you may want to check that chapter
to learn how to extend it for unhandled docbook elements.
.. include:: tutorial/01_02_write_html.php
:literal:
As you can see, it happens the same way, as for other conversion from Docbook
to any other format.
HTML styles
^^^^^^^^^^^
By default inline CSS is embedded in all generated HTML, to create a more
appealing default experience. This may of course be deactivated and you may
also reference custom style sheets to be included in the generated HTML.
.. include:: tutorial/01_03_write_html_styled.php
:literal:
For this we again use the converted directly to be able to configure it as we
like.
eZ Xml
------
eZ XML describes the markup format used internally by `eZ Publish`__ for
storing markup in content objects. The format is roughly specified in the `eZ
Publish documentation`__.
Modules are often register custom elements, which are not specified anywhere,
so there might be several elements not handled by default.
__ http://ez.no/ezpublish
__ http://ez.no/doc/ez_publish/technical_manual/4_0/reference/xml_tags
Reading eZ XML
^^^^^^^^^^^^^^
Reading eZ XML is basically the same as for all other formats:
.. include:: tutorial/02_00_read_ezxml.php
:literal:
As always the document object is either constructed from an input string or
file. To convert into docbook you may just use the method getAsDocbook().
Link handling
`````````````
Inside eZ XML documents link URIs are replaced with IDs, which reference the
links inside the eZ Publish database, to ensure that a changed link is update
globally. The replacing of such links is handled by a class extending from
ezcDocumentEzXmlLinkProvider. By default dummy URLs are added to the documents.
URLs are either referenced directly by their ID, a node ID, or an object ID.
Those parameters are passed to the link provide, which then should return an
URL for that.
.. include:: tutorial/02_01_link_provider.php
:literal:
The link provider is only implemented as a trivial stub, but you can establish
a database connection there and actually fetch the required data. I this case
the generated docbook document look like::
<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<article xmlns="http://docbook.org/ns/docbook">
<section>
<title>Paragraph</title>
<para>Some content, with a <ulink url="http://host/path/1">link</ulink>.</para>
</section>
</article>
The link provider is set again as a option of the converter. Like shown for the
docbook conversions of the other handlers, you can register element handlers
for yet unhandled eZ XML elements on the converter, too.
Wrting eZ XML
^^^^^^^^^^^^^
Writing eZ XML works nearly the same as reading. It again uses a XML based
element handled, like shown in the Docbook to RST conversion in more detail.
For the link conversion an object extending from ezcDocumentEzXmlLinkConverter
is used, which returns an array with the attributes of the link in the eZ XML
document.
Wiki markup
-----------
Wiki markup has no central standard, but is used as a term to describe some
common subset with lots of different extensions. Most wiki markup languages
only support a quite trivial markup with severe limitations on the recursion of
markup blocks. For example no markup really tables containing lists, or
especially not tables containing other tables.
The document component implements a generic parser to support multiple wiki
markup languages. For each different markup syntax a tokenizer has to be
implemented, which converts the implemented markup into a unified token stream,
which can then be handled by the generic parser.
The document component currently supports reading three wiki markup languages,
but new ones are added easily by implementing another tokenizer. Supported are:
- Creole__, developed by a initiative with the intention to create a unified
wiki markup standard. This is the default wiki language, and currently the
only one which can be written.
Creole currently only supports a very limited set of markup__, all further
markup additions are still up to discussion.
- Dokuwiki__ is a popular wiki system, for example used on `wiki.php.net`__
with a quite different syntax, and the most complete markup support, even
including something like footnotes.
- Confluence__ is a common Java based wiki with an entirely different and most
uncommon syntax, which has mainly been implemented to prove the generic
nature of the parser.
All markup languages are tested against all examples from the respective
markup language documentation, there might still be cases where the parsers of
the default implementation behaves slightly different from the implementation
in the document component.
__ http://www.wikicreole.org/
__ http://www.wikicreole.org/wiki/Elements
__ http://www.dokuwiki.org/dokuwiki
__ http://wiki.php.net/
__ http://confluence.atlassian.com/renderer/notationhelp.action?section=all
Reading wiki markup
^^^^^^^^^^^^^^^^^^^
Reading wiki texts basically works like for any other markup language:
.. include:: tutorial/03_00_read_wiki.php
:literal:
As said, by default the Creoletokenizer is used. The same result can be
produced with dokuwiki markup and switching the tokenizer:
.. include:: tutorial/03_01_read_wiki_confluence.php
:literal:
Writing wiki markup
^^^^^^^^^^^^^^^^^^^
Until now only writing of creole wiki markup is supported. Since creole does
not support a lot of the markup available in docbook, not all documents might
get converted properly. Because it does not even support explicit internal
references, we cannot even simulate footnotes like in HTML.
If you want to add support for such conversions, it works exactly like the
docbook RST conversion and can be extended the same way.
.. include:: tutorial/03_02_write_wiki.php
:literal:
PDF
---
PDF (Portable Document Format) has been developed to provide a document
format, which can be presented software and system independent. Because of
this it is often used as a pre-print document exchange format.
The document componen can generate PDF document from all other input formats
and offers a language very similar to CSS to apply custom styling to the
generated output. Additionally it supports adding custom parts, like footers
and headers, to the PDF document.
Reading PDF
^^^^^^^^^^^
The document component for now does not support reading PDF documents.
Writing PDF
^^^^^^^^^^^
Writing PDF basically works like writing any other format supported by the
document component, like the basic example shows:
.. include:: tutorial/04_01_create_pdf.php
:literal:
First we include some RST file to create a Docbook file from it, because, like
described before, Docbook is the central conversion format.
Afterwards the Docbook document is loaded by the PDF class and saved. When
converting the document to a string the PDF is renderer using the default
options and the default driver. The result of this rendering call can be
watched here: `04_01_create_pdf.pdf`__.
__ 04_01_create_pdf.pdf
Output writers
``````````````
Since there are numerous different PDF renderers in the PHP world and the
available ones might depend on the current environment, the document component
supports different PDF driver, as wrapper around different existent libraries.
For now two implementation exist for pecl/haru and TCPDF, but it is fairly easy
to write another one, for another PDF class.
Haru
""""
libharu__ is a open source PDF generation library, written in C, and wrapped
by the haru PHP extension, available from PECL__. If PEAR is correctly setup
on your machine it should install as easy as::
pear install pecl/haru
The Haru driver is pretty fast, but currently has issues with some special
characters. It is the default driver, but can be explicitly used by setting
the driver option on the PDF class, like::
$pdf = new ezcDocumentPdf();
$pdf->options->driver = new ezcDocumentPdfHaruDriver();
__ http://libharu.org
__ http://pecl.php.net/package/haru
TCPDF
"""""
TCPDF is a pure PHP based PDF generation library, available from
`tcpdf.org`__. To use the TCPDF driver you need to download and include its
main class before rendering the PDF. It supports all aspects of PDF rendering
required by the document component, but has some bad coding practices, like:
- Throws lots of warnings and notices, which you might want to silence by
temporarily changing the error reporting level
- Reads and writes several global variables, which might or might not
interfere with your application code
- Uses eval() in several places, which results in non-cacheable OP-Codes.
The TCPDF driver can be used after including the TCPDF source code, using::
$pdf = new ezcDocumentPdf();
$pdf->options->driver = new ezcDocumentPdfTcpdfDriver();
__ http://tcpdf.org
Styling the PDF
```````````````
The PDF output can be styled using a CSS like language, which assigns styles
based on the Docbook XML structure. The default styling rules are defined in
the `default.css`__.
__ https://svn.apache.org/repos/asf/incubator/zetacomponents/trunk/Document/src/pcss/style/default.css
The first most relevant part are the general layout options, which can be
defined for the common article root node in the Docbook XML file. You can set
global font options there, like::
article {
// Basic font style definitions
font-size: "12pt";
font-family: "serif";
font-weight: "normal";
font-style: "normal";
line-height: "1.4";
text-align: "left";
// Basic page layout definitions
text-columns: "1";
text-column-spacing: "10mm";
// General text layout options
orphans: "3";
widows: "3";
}
The meaning of the first set of options should be obvious from CSS. We require
each value to be wrapped by quotes for easier parsing, though.
The second set of options defines options for multi-column layouts, which are
not available in the web, but quite common in generated PDF documents. You can
specify the number of text columns, as well as the distance between the text
columns here.
The third set in this example defines lesser known text layout options like
the handling of `orphans and widows`__, which specify the handling of
overlapping parts of paragraphs on page wrapping.
You can, of course, apply those styles to any elements in your document, using
the common CSS addressing rules, like::
// Emphasis node anywhere in the document
emphasis { ... }
// Title element directly below a section element
section > title { ... }
// Title element anywhere below a section element
section title { ... }
// Title element with the ID "first_title"
title#first_title { ... }
// Title element with the class "foo"
title.foo { ... }
// emphasis node directly below a title with class "foo", anywhere in a
// section with the ID "first"
section#first title.foo > emphasis { ... }
The values and `measures`__ for the properties are very similar to the
properties in CSS. For example the margin and padding properties accept one-
to four-tuples of values, with the same respective meaning like in CSS.
Another central formatting element, which is special to the PDF generation, is
the virtual element "page"::
page {
page-size: "A4";
page-orientation: "portrait";
padding: "22mm 16mm";
}
The page-size property accepts several known page size identifiers and the
page-orientation defines the orientation of a page. You can also address any
page directly by its ID, which will be 'page_1' for the first page, or its
class, which will be "right", or "left", depending on the current page number.
A detailed description of all available `PDF style options`__ is available
here__.
__ http://en.wikipedia.org/wiki/Widows_and_orphans
__ measures
__ Document_styles.html
__ Document_styles.html
Measures
""""""""
The properties in the PDF component accept different measures, which are:
- "mm", Millimeters, the default measure, if none is specified
- "pt", Points, 72 points per inch
- "px", Pixel, depends on the set resolution, by default also 72 points per
inch
- "in", Inch
The unit "Points" is most common for font sizes, while millimeters or inches
will probably more useful for page paddings. You are free to choose any of
them and can even combine different units in one tuple, like::
para {
// Top margin: 12 mm; Right margin: .1 inch; Bottom margin: 10 points,
// Left margin: 1 pixel
margin: "12 .1in 10pt 1px";
}
PDF parts
`````````
PDF parts are additional parts in a rendered document, like headers and
footers. You can implement and register them yourself, and they are activated
by different triggers, like:
- on document creation
- on page creation
- when a document has been finished
The default implementation for headers and footers is triggered on page
creation and renders the title of the document, its author and a page number
in the header or the footer. To develop a custom PDF part you should extend
from the ezcDocumentPdfPart class.
For the following document we are using a set of custom styles, as well as a
header and a footer to customize the rendered PDF document. The additional
custom CSS changes the default font and the page border:
.. include:: tutorial/custom.css
:literal:
The code using the custom CSS and headers and footers then looks like:
.. include:: tutorial/04_02_create_pdf_styled.php
:literal:
The first part, the creation of a Docbook document from a RST document is just
the same like in the first example.
Afterwards we load the above mentioned custom.css as an additional style. You
can load as many styles as you want. If multiple styles are loaded, the latter
ones always (partly) redefine the first styles.
After that two custom PDF parts are registered using their respective option
class to configure their skin. The footer should only show the page number,
while the header should display all parts (title and author), but the page
number.
At the end of the example the document is created as usual, and looks like
this: `04_02_create_pdf_styled.pdf`__ Since the source document does not
include any author information, this information is also not rendered in the
header.
__ 04_02_create_pdf_styled.pdf
Hyphenating
```````````
Proper hyphenation is crucial for nice text rendering especially for justified
paragraph formatting. Since hyphenation is highly language dependent you can
create and use your own custom hyphenator - the default one doesn't do any
hyphenation by default, but just keeps every word as it is.
Custom hyphenators can be implemented by extending from the abstract class
ezcDocumentPdfHyphenator. The only need to implement one Method,
```splitWord()```, which should return possible splitting points of the given
word, as documented in the ezcDocumentPdfHyphenator class.
The custom hyphenator can be configured in the ezcDocumentPdfOptions class,
like this::
$pdf = new ezcDocumentPdf();
$pdf->options->hyphenator = new myHyphenator();
The hyphenator will then be used by all text renderers during the rendering
process.
Open Document Text
------------------
The Open Document Text (ODT) format is natively provided by the
`OpenOffice.org`__ office application suite and supported by other common word
processing tools. The Document component supports importing, exporting and
styling of ODT files.
.. note:: By now only im- and export of flat ODT (.fodt) files is possible.
These can be processed by OpenOffice.org natively. To store FODT,
simply choose the file type from the save dialog.
Reading ODT
^^^^^^^^^^^
The ODT document class reads FODT files and converts them into the internal
Docbook representation of the Document component:
.. include:: tutorial/05_00_read_fodt.php
:literal:
You can generate any of the supported document formats from the Docbook
representation.
FODT files may contain embedded media files, i.e. usually images, which will be
extracted during the import process. You can specify the directory where these images will
be stored through the ```imageDir``` option::
<?php
$odt->options->imageDir = '/path/to/your/images';
?>
The default is your systems temporary directory.
Since Open Document only contains few semantical information compared to
Docbook, the import mechanism performs heuristic detection of information like
emphasized text. This mechanism is quite rudimentary by now and will be made
available as a public API as it matured.
Writing ODT
^^^^^^^^^^^^^
FODT files can be written similar to any of the other formats supported by the
Document component:
.. include:: tutorial/05_01_write_fodt.php
:literal:
Styling ODT
^^^^^^^^^^^
FODT output can be styled using a CSS like language similar to `Styling the
PDF`_. Using simplified CSS you assign style rules to Docbook XML elements,
which are generated into automatic styles in the resulting Open Document. The
default styling rules (`default.css`__) are the same as for PDF.
__ https://svn.apache.org/repos/asf/incubator/zetacomponents/trunk/Document/src/pcss/style/default.css
Applying custom styles can be done as follows:
.. include:: tutorial/05_02_write_fodt_styled.php
:literal:
A detailed description of the available `style options` is available `here`__.
__ Document_styles.html
__ Document_styles.html
..
Local Variables:
mode: rst
fill-column: 79
End:
vim: et syn=rst tw=79