-
-
Notifications
You must be signed in to change notification settings - Fork 162
FAQ: Why Not Write Oil in X?
This is a common set of questions, enough so that I'm making a wiki page about it.
- Background: Oil Is Being Implemented "Middle Out" (with language-oriented programming, by translating Python to C++)
- Particularly the appendix has links
I don't claim that Oil's strategy is the only way! (However I will note that every POSIX-compatible shell is written in C, and there are deeper reasons for that than you might expect. Shell is a thin layer over the Unix kernel, which has an interface specified in C. Its major dependency is libc.)
How to Rewrite Oil in Nim, C++, D, or Rust (or C) (Summer 2020)
I think OCaml is probably the "top contender" -- it's has algebraic data types, garbage collection, and a predictable native compiler.
However, a huge part of a shell codebase is lexing and parsing, and I believe those are more naturally done with imperative / stateful languages. (On the other hand, type checking is probably more natural in OCaml. I say "probably" because location info and errors tend to drown out attempts at clean, short code.)
For example, ocamlyacc is no different than a parser generator in C. There is no real advantage to OCaml there, and C/C++ are faster. re2c ended up being a perfect code generator for Oil, and I would have to write OCaml bindings for its generated code, which I don't know how to do.
- Related: The Morbig paper parsed POSIX shell with Menhir (similar to yacc), but Menhir required new features in order for this to work. Oil handles the much larger bash language with "lexer modes", recursive descent, and algebraic data types. See How To Parse Shell Like a Programming Language (most work was done in 2016)
-
Good Subthread With Examples of Imperative Programming in OCaml -- to my eyes, writing a simple
for
loop andcontinue
is awkward (short circuiting). Shell and parsing are full of such code.
I think D would be another top contender -- it's a "fast systems language" with garbage collection. It has builtin dictionaries! (which are used all over Oil)
Though I think we would still want something like Zephyr ASDL for algebraic data types.
You Can't Write a Portable POSIX Shell in Portable Go due to its threaded runtime, and the fact that it doesn't use libc.
It doesn't necessarily mean Go is a bad choice, but it will cause more work.
Related comment and discussion about threads and fork(), which don't mix: https://news.ycombinator.com/item?id=31741222
I think Rust is promising in general, and it's obviously possible with enough effort.
But the shell AST is actually a big graph, and the parser is reused in an unusual way for interactive completion, giving it odd ownership semantics. I think garbage collection is natural for this problem.
Also, garbage collection must occur somewhere -- either in the Oil language or the host language (C++ / Rust). I think having it in the host language is very nice. You can write garbage collectors in (unsafe) Rust, but I don't know how to do it.
- Another answer here: https://old.reddit.com/r/oilshell/comments/ralaw3/backlog_rough_progress_assessments/hnlksxc/
- GCC and Clang still support many more architectures than the Rust compiler. I'd like Oil to be built on weird embedded systems with limited compiler support. That's not a dealbreaker, but it's a consideration.
- Note that Zephyr ASDL is more expressive than Rust's algebraic data types in at least one dimension. (comment on that)
I think you could probably "compress" bash's 142K lines of C into 100K lines of C++ or so. But then the Oil language would bring that to perhaps 200K lines.
I don't think I'm capable of writing 200K lines of C++ from scratch with a good architecture! (A few people I've worked with probably could, but I can't, and I think even most "good" C++ programmers can't.)
The 10-40K lines of Python let me aggressively refactor the code for years! And after living with this code for many years, I'm happy with how it turned out.
Also:
- Parsing Shell Was Like Black Box Reverse Engineering, and the low latency of an interpreter helps. C++ is slow to compile.
- Python has garbage collection, and it also turned out to be a rich source of metalanguages:
- Zephyr ASDL for algebraic data types
- pgen2 for LL parsing
- a regex parser, which, along with re2c, produces state machines in pure C
- Gradual typing with MyPy
Note: a manual translation to Nim is being attempted: https://forum.nim-lang.org/t/6756#42018
There seems to be a belief that automatically translating Python to Nim is easier than translating Python to C++, but I don't think that's true. The similar indentation-based syntax doesn't make translation easier; the semantics and libraries are what matter.
https://old.reddit.com/r/oilshell/comments/gqrixg/oil_08pre5_progress_in_c/frw0sl7/
From what I understand, I think Nim could be a good language for writing a shell from scratch. Although one thing I didn't like is that the generated C code is not readable.
It is more like a control flow graph serialized into C, from what I remember.
An explicit goal of Oil's C++ translation is to be able to read, debug, profile the generated code with standard C++ tools, which are powerful and numerous. Functions are functions; loops are loops; ifs are ifs; etc.
One answer here: https://lobste.rs/s/e6u4zi/garbage_collected_heap_c_shaped_like#c_kgepb7
EDIT: It's possible that this would work, we have done an experiment along these lines
It's less straightforward than what we're doing, since it has to infer types at build time
This has become a bit of a FAQ too! Related: Oil 0.12.7 - Garbage Collector Problems
- The technique is inherently unportable
- I would like Oil to be able to bootstrap OSes on weird architectures, without writing assembly code.
- It's pretty big and complex
- It's at least 33K lines of C code, and some (arch-specific) assembly.
- For comparison, we have less than 7K lines of hand-written C++ in Oil
- the Nix evaluator appears to be carrying around Boehm GC patches for Darwin. I don't want to become a Boehm maintainer!
- It's supposed to be "drop in", but in reality ...
- To make good use of it, you have to give it hints about where pointers may or may not be
- There are many tuning parameters. (author of Nix evaluator rewrite commented on this)
- The risk of imprecision is higher on 32-bit systems; a shell still has good use cases on 32-bit systems.
- The safety is questionable -- it changes when compilers change, and they have changed a lot since Boehm GC was initially developed
- Good perspective from from Henderson about this in Accurate Garbage Collection in Uncooperative Environments (2002)
-
Zulip: Implementation Language FAQ. (requires login). Go, Rust, D, Nim, etc.
- I may copy more links here as necessary