FAQ: Why Not Write Oil in X?

This is a common set of questions, enough so that I'm making a wiki page about it.

Background: Oil Is Being Implemented "Middle Out" (with language-oriented programming, by translating Python to C++)
- Particularly the appendix has links

First: I Encourage Parallel Experiments in Other Languages

I don't claim that Oil's strategy is the only way! (However I will note that every POSIX-compatible shell is written in C, and there are deeper reasons for that than you might expect. Shell is a thin layer over the Unix kernel, which has an interface specified in C. Its major dependency is libc.)

How to Rewrite Oil in Nim, C++, D, or Rust (or C) (Summer 2020)

Why Not OCaml?

I think OCaml is probably the "top contender" -- it's has algebraic data types, garbage collection, and a predictable native compiler.

However, a huge part of a shell codebase is lexing and parsing, and I believe those are more naturally done with imperative / stateful languages. (On the other hand, type checking is probably more natural in OCaml. I say "probably" because location info and errors tend to drown out attempts at clean, short code.)

For example, ocamlyacc is no different than a parser generator in C. There is no real advantage to OCaml there, and C/C++ are faster. re2c ended up being a perfect code generator for Oil, and I would have to write OCaml bindings for its generated code, which I don't know how to do.

Related: The Morbig paper parsed POSIX shell with Menhir (similar to yacc), but Menhir required new features in order for this to work. Oil handles the much larger bash language with "lexer modes", recursive descent, and algebraic data types. See How To Parse Shell Like a Programming Language (most work was done in 2016)
Good Subthread With Examples of Imperative Programming in OCaml -- to my eyes, writing a simple for loop and continue is awkward (short circuiting). Shell and parsing are full of such code.

Why not D?

I think D would be another top contender -- it's a "fast systems language" with garbage collection. It has builtin dictionaries! (which are used all over Oil)

Though I think we would still want something like Zephyr ASDL for algebraic data types.

Why Not Go?

You Can't Write a Portable POSIX Shell in Portable Go due to its threaded runtime, and the fact that it doesn't use libc. It doesn't necessarily mean Go is a bad choice, but it will cause more work.

Why Not Rust?

I think Rust is promising in general, and it's obviously possible with enough effort.

But the shell AST is actually a big graph, and the parser is reused in an unusual way for interactive completion, giving it odd ownership semantics. That is, I think garbage collection is natural for this problem.

Also, garbage collection must occur somewhere -- either in the Oil language or the host language (C++ or Rust). I think having it in the host language is very nice. You can write garbage collectors in Rust, but I don't know how to do it.

Another answer here: https://old.reddit.com/r/oilshell/comments/ralaw3/backlog_rough_progress_assessments/hnlksxc/
- GCC and Clang still support many more architectures than the Rust compiler. I'd like Oil to be built on weird embedded systems with limited compiler support. That's not a dealbreaker, but it's a consideration.
Note that Zephyr ASDL is more expressive than Rust's algebraic data types in at least one dimension. (comment on that)

Why Not Write it By Hand in C++?

I think you could probably "compress" bash's 142K lines of C into 100K lines of C++ or so. But then the Oil language would bring that to perhaps 200K lines.

I don't think I'm capable of writing 200K lines of C++ from scratch with a good architecture! (A few people I've worked with probably could, but I can't, and I think even most "good" C++ programmers can't.)

The 10-40K lines of Python let me aggressively refactor the code for years! And after living with this code for many years, I'm happy with how it turned out.

Also:

Parsing Shell Was Like Black Box Reverse Engineering, and the low latency of an interpreter helps. C++ is slow to compile.
Python has garbage collection, and it also turned out to be a rich source of metalanguages:
- Zephyr ASDL for algebraic data types
- pgen2 for LL parsing
- a regex parser, which, along with re2c, produces state machines in pure C
- Gradual typing with MyPy

Nim?

Note: a manual translation to Nim is being attempted: https://forum.nim-lang.org/t/6756#42018

Semi-Automatic Translation

There seems to be a belief that automatically translating Python to Nim is easier than translating Python to C++, but I don't think that's true. The similar indentation-based syntax doesn't make translation easier; the semantics and libraries are what matter.

https://old.reddit.com/r/oilshell/comments/gqrixg/oil_08pre5_progress_in_c/frw0sl7/

Writing From Scratch

From what I understand, I think Nim could be a good language for writing a shell from scratch. Although one thing I didn't like is that the generated C code is not readable.

It is more like a control flow graph serialized into C, from what I remember.

An explicit goal of Oil's C++ translation is to be able to read, debug, profile the generated code with standard C++ tools, which are powerful and numerous. Functions are functions; loops are loops; ifs are ifs; etc.

Why Not Use [Dynamic Language With JIT] instead of C++?

https://old.reddit.com/r/ProgrammingLanguages/comments/umlo1x/brief_descriptions_of_a_python_to_c_translator/i878m3u/

Why Not Run the Oil Interpreter with PyPy?

One answer here: https://lobste.rs/s/e6u4zi/garbage_collected_heap_c_shaped_like#c_kgepb7

Why Not Rewrite the Oil Interpreter in RPython, and generate C without a JIT?

It's possible this would work, but it's much less straightforward than what we're doing. As far as I understand, building PyPy with RPython can take HOURS. There is a lot of fancy type inference going on. In contrast, translating and building oil-native with a C++ compiler takes 30 seconds!

Also my understanding is that RPython is a relatively unpleasant C-like language which just happens to have Python syntax. It is very low level and explicit.

Whereas our MyPy / Zephyr ASDL dialect is a bit like writing OCaml. It is high level -- it uses strings as Python-like values, not strings as C-like buffers, etc.

Links

Zulip: Implementation Language FAQ. (requires login). Go, Rust, D, Nim, etc.
- I may copy more links here as necessary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly