-
-
Notifications
You must be signed in to change notification settings - Fork 162
OSH Parser
andychu edited this page Dec 28, 2017
·
20 revisions
These facts are useful for the parsing contest.
- 15 lexer modes (lexical state)
- 233 IDs (token types / node types) in 23 kinds (
core/id_kind_test.py
shows this) - 3 recursive descent parsers (command, word,
[[
) - 1 Pratt parser (arithmetic)
-
[
fallback reusesosh/bool_parser
- TODO: modify asdl.py to show these stats?
- X product types
- X sum types with X alternatives
- what CPython opcodes does it use?
- how many lines of code does it use in CPython? (Compare with execution.)
- What is the distribution of ASDL string and array lengths per node type?
- note: there are several uses of string, not just token. Is this a good or bad optimization?
- Brace detection -- this is a separate metaprogramming pass (doesn't depend on input). This is a recursive parser, although it operates entirely on token types and not chars/strings?
- Per-Word Algorithms
-
core/glob_.py
LooksLikeGlob
GlobEscape
-
GlobUnescape
(in case of no matches, may not be necessary)
- regex escape, for passing to
regcomp()
(not done yet)
-
- checking validity of names:
for invalid-var in a b; do ...
readonly invalid-var
-
core/word_eval.py
-- after evaluating VarOp arguments, we compile globs to Python regexes, e.g. for${x%foo*}
- IFS splitting (this is quite slow and needs to be sped up!)
-
core/args.py
-- this is not a recursive parser -
echo -e
-- backslash escapes (andprintf
if it turns out we need it as a builtin) -
read
without -r -- backslash escapes are parsed
- Polymorphism:
- Reader
-
FileLineReader
: file system e.g.source
,stdin
-
StringLineReader
:eval
,-c
andPS4
-
VirtualLineReader
: here docs.
-
-
BoolParser
can taketest_builtin._StringWordEmitter
orWordParser
-
Arena
instances? Not sure that requires polymorphism, since there is one type right now. We might have different policies for tools vs. the runtime though.
- Reader