Skip to content

FAQ: Why Not Write Oil in X?

andychu edited this page Jun 10, 2022 · 57 revisions

This is a common set of questions, enough so that I'm making a wiki page about it.

First: I Encourage Parallel Experiments in Other Languages

I don't claim that Oil's strategy is the only way! (However I will note that every POSIX-compatible shell is written in C, and there are deeper reasons for that than you might expect. Shell is a thin layer over the Unix kernel, which has an interface specified in C. Its major dependency is libc.)

How to Rewrite Oil in Nim, C++, D, or Rust (or C) (Summer 2020)

Why Not OCaml?

I think OCaml is probably the "top contender" -- it's has algebraic data types, garbage collection, and a predictable native compiler.

However, a huge part of a shell codebase is lexing and parsing, and I believe those are more naturally done with imperative / stateful languages. (On the other hand, type checking is probably more natural in OCaml. I say "probably" because location info and errors tend to drown out attempts at clean, short code.)

For example, ocamlyacc is no different than a parser generator in C. There is no real advantage to OCaml there, and C/C++ are faster. re2c ended up being a perfect code generator for Oil, and I would have to write OCaml bindings for its generated code, which I don't know how to do.

Why not D?

I think D would be another top contender -- it's a "fast systems language" with garbage collection. It has builtin dictionaries! (which are used all over Oil)

Though I think we would still want something like Zephyr ASDL for algebraic data types.

Why Not Go?

You Can't Write a Portable POSIX Shell in Portable Go due to its threaded runtime, and the fact that it doesn't use libc. It doesn't necessarily mean Go is a bad choice, but it will cause more work.

Why Not Rust?

I think Rust is promising in general, and it's obviously possible with enough effort.

But the shell AST is actually a big graph, and the parser is reused in an unusual way for interactive completion, giving it odd ownership semantics. That is, I think garbage collection is natural for this problem.

Also, garbage collection must occur somewhere -- either in the Oil language or the host language (C++ or Rust). I think having it in the host language is very nice. You can write garbage collectors in Rust, but I don't know how to do it.

Why Not Write it By Hand in C++?

I think you could probably "compress" bash's 142K lines of C into 100K lines of C++ or so. But then the Oil language would bring that to perhaps 200K lines.

I don't think I'm capable of writing 200K lines of C++ from scratch with a good architecture! (A few people I've worked with probably could, but I can't, and I think even most "good" C++ programmers can't.)

The 10-40K lines of Python let me aggressively refactor the code for years! And after living with this code for many years, I'm happy with how it turned out.

Also:

  • Parsing Shell Was Like Black Box Reverse Engineering, and the low latency of an interpreter helps. C++ is slow to compile.
  • Python has garbage collection, and it also turned out to be a rich source of metalanguages:
    • Zephyr ASDL for algebraic data types
    • pgen2 for LL parsing
    • a regex parser, which, along with re2c, produces state machines in pure C
    • Gradual typing with MyPy

Nim?

Note: a manual translation to Nim is being attempted: https://forum.nim-lang.org/t/6756#42018

Semi-Automatic Translation

There seems to be a belief that automatically translating Python to Nim is easier than translating Python to C++, but I don't think that's true. The similar indentation-based syntax doesn't make translation easier; the semantics and libraries are what matter.

https://old.reddit.com/r/oilshell/comments/gqrixg/oil_08pre5_progress_in_c/frw0sl7/

Writing From Scratch

From what I understand, I think Nim could be a good language for writing a shell from scratch. Although one thing I didn't like is that the generated C code is not readable.

It is more like a control flow graph serialized into C, from what I remember.

An explicit goal of Oil's C++ translation is to be able to read, debug, profile the generated code with standard C++ tools, which are powerful and numerous. Functions are functions; loops are loops; ifs are ifs; etc.

Why Not Use [Dynamic Language With JIT] instead of C++?

Why Not Run the Oil Interpreter with PyPy?

One answer here: https://lobste.rs/s/e6u4zi/garbage_collected_heap_c_shaped_like#c_kgepb7

Why Not Rewrite the Oil Interpreter in RPython, and generate C without a JIT?

It's possible this would work, but it's much less straightforward than what we're doing. As far as I understand, building PyPy with RPython can take HOURS. There is a lot of fancy type inference going on. In contrast, translating and building oil-native with a C++ compiler takes 30 seconds!

Also my understanding is that RPython is a relatively unpleasant C-like language which just happens to have Python syntax. It is very low level and explicit.

Whereas our MyPy / Zephyr ASDL dialect is a bit like writing OCaml. It is high level -- it uses strings as Python-like values, not strings as C-like buffers, etc.

Links

Clone this wiki locally