-
-
Notifications
You must be signed in to change notification settings - Fork 163
FAQ: Why Not Write Oil in X?
This is a common set of questions, enough so that I'm making a wiki page about it.
- Background: Oil Is Being Implemented "Middle Out" (with language-oriented programming, by translating Python to C++)
- Particularly the appendix has links
I don't claim that Oil's strategy is the only way! (However I will note that every POSIX-compatible shell is written in C, and there are deeper reasons for that than you might expect. Shell is a thin layer over the Unix kernel, which has an interface specified in C. Its major dependency is libc.)
How to Rewrite Oil in Nim, C++, D, or Rust (or C) (Summer 2020)
I think OCaml is probably the "top contender" -- it's has algebraic data types, garbage collection, and a predictable native compiler.
However, a huge part of a shell codebase is lexing and parsing, and I believe those are more naturally done with imperative / stateful languages. (On the other hand, type checking is probably more natural in OCaml. I say "probably" because location info and errors tend to drown out attempts at clean, short code.)
For example, ocamlyacc is no different than a parser generator in C. There is no real advantage to OCaml there, and C/C++ are faster. re2c ended up being a perfect code generator for Oil, and I would have to write OCaml bindings for its generated code, which I don't know how to do.
- Related: The Morbig paper parsed POSIX shell with Menhir (similar to yacc), but Menhir required new features in order for this to work. Oil handles the much larger bash language with "lexer modes", recursive descent, and algebraic data types. See How To Parse Shell Like a Programming Language (most work was done in 2016)
-
Good Subthread With Examples of Imperative Programming in OCaml -- to my eyes, writing a simple
for
loop andcontinue
is awkward (short circuiting). Shell and parsing are full of such code.
I think D would be another top contender -- it's a "fast systems language" with garbage collection. It has builtin dictionaries! (which are used all over Oil)
Though I think we would still want something like Zephyr ASDL for algebraic data types.
You Can't Write a Portable POSIX Shell in Portable Go due to its threaded runtime, and the fact that it doesn't use libc. It doesn't necessarily mean Go is a bad choice, but it will cause more work.
I think Rust is promising in general, and it's obviously possible with enough effort.
But the shell AST is actually a big graph, and the parser is reused in an unusual way for interactive completion, giving it odd ownership semantics. That is, I think garbage collection is natural for this problem.
Also, garbage collection must occur somewhere -- either in the Oil language or the host language (C++ or Rust). I think having it in the host language is very nice. You can write garbage collectors in Rust, but I don't know how to do it.
- Another answer here: https://old.reddit.com/r/oilshell/comments/ralaw3/backlog_rough_progress_assessments/hnlksxc/
- GCC and Clang still support many more architectures than the Rust compiler. I'd like Oil to be built on weird embedded systems with limited compiler support. That's not a dealbreaker, but it's a consideration.
- Note that Zephyr ASDL is more expressive than Rust's algebraic data types in at least one dimension. (comment on that)
I think you could probably "compress" bash's 142K lines of C into 100K lines of C++ or so. But then the Oil language would bring that to perhaps 200K lines.
I don't think I'm capable of writing 200K lines of C++ from scratch with a good architecture! (A few people I've worked with probably could, but I can't, and I think even most "good" C++ programmers can't.)
The 10-40K lines of Python let me aggressively refactor the code for years! And after living with this code for many years, I'm happy with how it turned out.
Also:
- Parsing Shell Was Like Black Box Reverse Engineering, and the low latency of an interpreter helps. C++ is slow to compile.
- Python has garbage collection, and it also turned out to be a rich source of metalanguages:
- Zephyr ASDL for algebraic data types
- pgen2 for LL parsing
- a regex parser, which, along with re2c, produces state machines in pure C
- Gradual typing with MyPy
Note: a manual translation to Nim is being attempted: https://forum.nim-lang.org/t/6756#42018
There seems to be a belief that automatically translating Python to Nim is easier than translating Python to C++, but I don't think that's true. The similar indentation-based syntax doesn't make translation easier; the semantics and libraries are what matter.
https://old.reddit.com/r/oilshell/comments/gqrixg/oil_08pre5_progress_in_c/frw0sl7/
From what I understand, I think Nim could be a good language for writing a shell from scratch. Although one thing I didn't like is that the generated C code is not readable.
It is more like a control flow graph serialized into C, from what I remember.
An explicit goal of Oil's C++ translation is to be able to read, debug, profile the generated code with standard C++ tools, which are powerful and numerous. Functions are functions; loops are loops; ifs are ifs; etc.
One answer here: https://lobste.rs/s/e6u4zi/garbage_collected_heap_c_shaped_like#c_kgepb7
It's possible this would work, but it's much less straightforward than what we're doing. As far as I understand, building PyPy with RPython can take HOURS. There is a lot of fancy type inference going on. In contrast, translating and building oil-native
with a C++ compiler takes 30 seconds!
Also my understanding is that RPython is a relatively unpleasant C-like language which just happens to have Python syntax. It is very low level and explicit.
Whereas our MyPy / Zephyr ASDL dialect is a bit like writing OCaml. It is high level -- it uses strings as Python-like values, not strings as C-like buffers, etc.
-
Zulip: Implementation Language FAQ. (requires login). Go, Rust, D, Nim, etc.
- I may copy more links here as necessary