-
-
Notifications
You must be signed in to change notification settings - Fork 163
Language Design and Theory of Computation
Deciding whether to print "syntax error" shouldn't require simulating a Turing machine.
-
Parsing Bash is Undecidable
- Bash, Perl, and Make can't be statically parsed because they interleave parsing and execution.
- C++ can't be statically parsed because it has two Turing complete computation mechanisms -- templates at compile time and normal code at runtime -- and parsing sits in between them.
-
Shell wasn't Turing complete when first designed.
-
Make wasn't Turing complete when first designed. See Example Code in Shell, Awk, and Make.
-
sedlisp is proof that sed is Turing complete. Not sure what constructs are used.
-
Accidentally Turing Complete -- nice comprehensive list, including some games which we don't care about here
- SQL
- However, with the features Common Table Expressions and Windowing, SQL is Turing Complete.
- https://wiki.postgresql.org/wiki/Cyclic_Tag_System
- Apache rewrite rules
- sendmail
- MediaWiki templates
- SQL
-
What about TeX and PostScript? I suspect those were both intended to be Turing Complete, although the syntax is odd.
-
printf is Turing complete.
%n
specifier writes to memory?- printbf -- Brainfuck interpreter in printf
- Section 6.2.1 of http://nebelwelt.net/publications/files/15SEC.pdf
When we control the arguments to printf(), it is possible to obtain Turing-complete computation. We show this formally in Appendix B by giving calls to printf() which create logic gates.
-
x86 MMU Fault Handling
- trapcc -- Compute with 0 instructions on Intel!
This is a proof by construction that the Intel MMU's fault handling mechanism is Turing complete. We have constructed an assembler that translates 'Move, Branch if Zero, Decrement' instructions to C source that sets up various processor control tables. After this code has executed, the CPU computes by attempting to fault without ever executing a single instruction.
TODO: Move to Parsing Is Difficult page?
Humans are better than computers at parsing programming languages.
To-do: collect instances of people arguing for grammars . For tools, IDEs?
- Nice series by Trevor Jim
- Python is not context-free? by Trevor Jim
-
Is Java context-free? by Trevor Jim.
- two grammars is a sign that context-free
- about 4 or 5 other posts
- [The Context-Sensitivity of C's Grammar](http://eli.thegreenplace.net/2007/11/24/the-context-sensitivity-of-cs grammar)
- Perl/C++/Make : not really on the table because they're not statically parseable.
- Ruby: experience with writing a full Ruby parser in Ruby https://news.ycombinator.com/item?id=13041646
-
Zinc Abstract Machine paper for CamlLight: the yacc parser had ~50 reduce/reduce conflicts and ~500 shift/reduce conflicts, or it had to be written in a convoluted style. Problem: Caml doesn't have closing delimiters.
-
CoffeeScript video: "Rise of Transpilers": "cheating" by adding fake tokens in the input
-
JRuby: cheating by massaging output
-
How Clang handles the type/variable name ambiguity of C++ -- My gripe is not with the grammar itself (although I admit it's needlessly complex), it's with the inability of Yacc-generated LALR(1) parsers to parse it without considerable hacks. As I've mentioned numerous times before, industrial-strength compilers for C/C++ exist after all, so they do manage to somehow parse these languages.
Alternative idea: parsers should be architected as pure functions from tokens to AST... (Doesn't matter what computational model) And then you can instrument them and empirically determine backtracking/efficiency ?
- TODO Review/Summarize papers
- Yakker -- scannerless, code available in OCaml
- Niall -- for data formats, length prefixes and checksums, etc.
- Pure Declarative Syntax Lost and Regained (GLR). From Nix authors.
- Langsec: this is a different but related issue. CFG is the wrong model. Models should be based on reality.
- finite vs infinite languages. Length prefixes.
Sweet spot? Context-sensitive / Turing complete, linear time lexing, and statically parseable
PEGs combine them, only for reasons of presentation.
Had this discussion twice: at work about GLR, and maybe about Gazelle. And on HN: https://news.ycombinator.com/item?id=13042835
Wrote up a wiki page that was popular: Why Lexing and Parsing Should Be Separate
- Turing completeness vs. side effects/capabilities. Both of these are design issues and shouldn't be conflated.
- Whether there is abstraction power like functions. I think Awk was Turing complete at first, but it didn't have functions. Related lobsters thread.