Skip to content
Georgi Sabev edited this page Jun 16, 2021 · 2 revisions

Why?

The GrootFS ReExecer provides a way to execute functions in the grootfs codebase in different process environments. For example, we can define a function to unpack a tarball and have that execute in a chrooted filesystem with UIDs and GIDs mapped in a user namespace.

We could write the functions as separate binaries, but this approach allows us to keep the code together in a single binary for simplicity, and to easily reuse the environment preparation code.

Go Limitations

In order to run a function in a different environment, it must be executed as a new process, otherwise we affect the parent process. In linux, this means forking the current process and execing a binary in the newly forked child process. Go cannot expose this fork then exec model (although you could try doing it with sys-calls) because of the way it uses OS-level threads to power its goroutine scheduling. A fork would make the child process have just a single thread and it would lose all the other related threads necessary to continue running go code, in particular the exec call. Instead, go provides exec.Command() to create an *exec.Cmd which when run will fork and immediately exec a new process.

We can set the SysProcAttr.Cloneflags property on an *exec.Cmd to make the process run in a new namespace. In a language with a separate fork and exec, this would happen on the fork call, then some preprocessing could set up the cloned namespace, for example setting a new hostname in a UTS namespace, before exec is finally invoked. In go, there is no place between fork and exec to perform such preprocessing. To launch a command from go code in an initialised namespace, that command needs to be wrapped in another command to perform the initialisation. If that wrapper were implemented in go, then it should do the initialisation and then run another exec.Command() to finally invoke the desired command.

We use the reexec pattern to provide a space for performing that initialisation. See Ed King's blog post for further background.

Containers/Storage ReExecer

GrootFS uses the reexec package from containers/storage. Note that you will also find something very similar, if not identical, in the Docker repositories.

The reexecer principle is straightforward:

  • A map of IDs to functions is maintained by the package

    • Use the Register(id string, fn func()) function to add to this map.
    • Register() should be invoked from the init() function in the package next to your function definition
  • An Init() function is provided

    • This looks at os.Args[0]. If this matches an ID in the map, the associated function is invoked.
    • The return value is true if a function was invoked, else false.
    • In an init() function, which need not be where the Register() is called, invoke Init():
    if reexec.Init() {
        os.Exit(0)
    }
    
    • The os.Exit() ensures we don't go on to execute main() when we don't intend to.
  • The normal code flow invokes a reexeced function by running an *exec.Cmd created with reexec.Command()

    • This uses /proc/self/exe as the binary and sets args[0] appropriately

GrootFS ReExecer

The GrootFS ReExecer wraps the reexecer above. It transparently provides the environment initialisation required to map user and group IDs in user namespaces and to set up chroots.

Instead of registering reexec functions with the containers/storage/reexec package, use the grootfs/sandbox package's Register() function. The passed function signature is different, as it takes a logger, a slice of files and variadic extra args, and returns an error.

A reexec function is executed using the Reexec() method on the object returned from sandbox.NewReexecer().

Double Wrapping

In fact, the GrootFS ReExecer doubly wraps the Containers/Storage reexecer. When registering a function, first another function with the ID suffix -wrapper is registered, and this reexecs the intended function. See the code for the details: first registration, second registration. This commit describes the rationale behind this approach.

Clone this wiki locally