forked from amauryfa/lxml
-
Notifications
You must be signed in to change notification settings - Fork 4
/
IDEAS.txt
54 lines (32 loc) · 1.72 KB
/
IDEAS.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
Things to try out when life permits
===================================
* zlib-based parsing/serialising of compressed in-memory data
* requires a libxml2 I/O OutputBuffer with appropriate I/O functions
that call into the zlib compression routines
* lzma-based parsing/serialising of compressed in-memory data
* requires a libxml2 I/O OutputBuffer with appropriate I/O functions
that call into the lzma compression routines
* advantage over zlib: probably faster and better compression
* maybe embed the lzma C sources in the distro
http://www.7-zip.org/sdk.html
* generating XML using the ``with`` statement
http://comments.gmane.org/gmane.comp.python.general/579950?set_lines=100000
* parse-time validation against a user provided DTD
* currently only works for XML Schema
* somehow integrate RelaxNG compact notation (rnc versus rng)
* currently not supported by libxml2 (patch exists)
* support subclassing XSLTAccessControl to provide custom per-URL
access check methods
* maybe custom resolvers are enough, or can be combined with this?
* reimplement iterparse() using the libxml2 xmlReader API
* Advantage: the implementation can be made safer than the current
SAX implementation, as the parser would not interact with the
Python-level tree.
* Disadvantage: the tree has to be built manually. In the current
SAX based implementation, libxml2 does it for us.
* rewrite iterparse() to accept a parser as argument instead of being
one
* disadvantage: iterparse() can't deal with all parser options
* provide an HTMLParser wrapper that handles broken encodings in broken
HTML better, e.g. using BeautifulSoup's "unicode dammit" analyser
* expose namespace prefixes through the QName class