diff --git a/CHANGELOG.md b/CHANGELOG.md index aebb466e..67c2dfe1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). - MPI hook: added support for the environment variable `MPI_COMPATIBILITY_TYPE` that defines the behaviour of the compatibility check of the libraries that the hook mounts. Valid values are `major`, `full` and `strict`. Default value is `major`. +- MPI hook: added support for the `HOOK_ROOTLESS` environment variable, which enables to use the hook under rootless container runtimes - SSH Hook: added a poststop functionality that kills the Dropbear process in case the hook does not join the container's PID namespace. - Added the `sarus ps` command to list running containers - Added the `sarus kill` command to terminate (and subsequently remove) containers diff --git a/CI/installation/install_packages_opensuseleap:15.5.sh b/CI/installation/install_packages_opensuseleap:15.5.sh index f404b0bc..fcceff43 100755 --- a/CI/installation/install_packages_opensuseleap:15.5.sh +++ b/CI/installation/install_packages_opensuseleap:15.5.sh @@ -3,7 +3,7 @@ set -ex # Install packages -sudo zypper install -y gcc-c++ glibc-static wget which git gzip bzip2 tar \ +sudo zypper install -y gcc-c++ glibc-static wget which git gzip bzip2 tar procps \ make autoconf automake squashfs cmake zlib-devel zlib-devel-static \ runc tini-static skopeo umoci \ libboost_filesystem1_75_0-devel \ diff --git a/doc/config/mpi-hook.rst b/doc/config/mpi-hook.rst index 456d3302..1ff8df3a 100644 --- a/doc/config/mpi-hook.rst +++ b/doc/config/mpi-hook.rst @@ -32,21 +32,23 @@ Hook configuration ================== The program is meant to be run as a **createContainer** hook and does not accept -arguments, but its actions are controlled through a few environment variables: +arguments. The following environment variables must be defined: -* ``LDCONFIG_PATH`` (REQUIRED): Absolute path to a trusted ``ldconfig`` +* ``LDCONFIG_PATH``: Absolute path to a trusted ``ldconfig`` program **on the host**. -* ``MPI_LIBS`` (REQUIRED): Colon separated list of full paths to the host's +* ``MPI_LIBS``: Colon separated list of full paths to the host's libraries that will substitute the container's libraries. The ABI compatibility is checked by comparing the version numbers specified in the libraries' file names according to the specifications selected with the variable ``MPI_COMPATIBILITY_TYPE``. -* ``MPI_COMPATIBILITY_TYPE`` (OPTIONAL): String determining the logic adopted +The following optional environment variables are also supported: + +* ``MPI_COMPATIBILITY_TYPE``: String determining the logic adopted to check the ABI compatibility of MPI libraries. - Must be one of ``major``, ``full``, or ``strict``. - If not defined, defaults to ``major``. + If defined, must be one of ``major``, ``full``, ``strict``. + If unset or set to an unexpected value, defaults to ``major``. The checks performed for compatibility in the different cases are as follows: * ``major`` @@ -76,17 +78,31 @@ arguments, but its actions are controlled through a few environment variables: This compatibility check is in agreement with the MPICH ABI version number schema. -* ``MPI_DEPENDENCY_LIBS`` (OPTIONAL): Colon separated list of absolute paths to +* ``MPI_DEPENDENCY_LIBS``: Colon separated list of absolute paths to libraries that are dependencies of the ``MPI_LIBS``. These libraries are always bind mounted in the container under ``/usr/lib``. -* ``BIND_MOUNTS`` (OPTIONAL): Colon separated list of absolute paths to generic +* ``BIND_MOUNTS``: Colon separated list of absolute paths to generic files or directories that are required for the correct functionality of the host MPI implementation (e.g. specific device files). These resources will be bind mounted inside the container with the same path they have on the host. If a path corresponds to a device file, that file will be whitelisted for read/write access in the container's devices cgroup. +* ``HOOK_ROOTLESS``: String indicating whether the hook is being run under + a rootless container runtime. It determines some of the actions undertaken by + the hook before performing its bind mounts, for example if identity switches + are required to validate the mounts or to work with "root squashed" + filesystems. + By default, the hook operates in fully privileged mode, assuming "real root" + capabilities. This is the way the hook is run under Sarus, and in such a case + it is recommended to leave this environment variable unset. + + If this variable is set to ``True`` (case-insensitive), the hook assumes + rootless execution. + This setting is intended to enable using the hook under unprivileged tools + like rootless Podman or Enroot. + The following is an example of `OCI hook JSON configuration file `_ enabling the MPI hook: diff --git a/doc/config/ssh-hook.rst b/doc/config/ssh-hook.rst index a8e55f1b..5c1f828d 100644 --- a/doc/config/ssh-hook.rst +++ b/doc/config/ssh-hook.rst @@ -30,7 +30,7 @@ In the prestart stage the hook sets up the container to accept connections and s In the poststop stage, cleanup of the SSH daemon process takes place. One OCI hook JSON configuration files is sufficient, provided it defines ``"stages": ["prestart", "poststop"]``. -The configuration of the ssh hook expects to receive its own name/location as the first argument, +The hook expects to receive its own name/location as the first argument, and the string ``start-ssh-daemon`` as positional argument. In addition, the following environment variables must be defined: diff --git a/src/hooks/mpi/MpiHook.cpp b/src/hooks/mpi/MpiHook.cpp index a3c2fd23..02b3db41 100644 --- a/src/hooks/mpi/MpiHook.cpp +++ b/src/hooks/mpi/MpiHook.cpp @@ -129,6 +129,10 @@ void MpiHook::parseEnvironmentVariables() { abiCompatibilityCheckerType = std::string(p); } + if ((p = getenv("HOOK_ROOTLESS")) != nullptr) { + rootless = (boost::algorithm::to_upper_copy(std::string(p)) == std::string("TRUE")); + } + log("Successfully parsed environment variables", libsarus::LogLevel::INFO); } @@ -252,7 +256,7 @@ void MpiHook::injectHostLibrary(const SharedLibrary& hostLib, if (it == hostToContainerLibs.cend()) { log(boost::format{"no corresponding libs in container => bind mount (%s) into /lib"} % hostLib.getPath(), libsarus::LogLevel::DEBUG); auto containerLib = "/lib" / hostLib.getPath().filename(); - libsarus::mount::validatedBindMount(hostLib.getPath(), containerLib, userIdentity, rootfsDir); + libsarus::mount::validatedBindMount(hostLib.getPath(), containerLib, userIdentity, rootfsDir, 0, rootless); createSymlinksInDynamicLinkerDefaultSearchDirs(containerLib, hostLib.getPath().filename(), false); return; } @@ -265,14 +269,14 @@ void MpiHook::injectHostLibrary(const SharedLibrary& hostLib, auto areCompatible{abiCompatibilityChecker->check(hostLib, bestCandidateLib)}; if(areCompatible.second == boost::none) { log(boost::format{"abi-compatible => bind mount host lib (%s) on top of container lib (%s) (i.e. override)"} % hostLib.getPath() % bestCandidateLib.getPath(), libsarus::LogLevel::DEBUG); - libsarus::mount::validatedBindMount(hostLib.getPath(), bestCandidateLib.getPath(), userIdentity, rootfsDir); + libsarus::mount::validatedBindMount(hostLib.getPath(), bestCandidateLib.getPath(), userIdentity, rootfsDir, 0, rootless); createSymlinksInDynamicLinkerDefaultSearchDirs(bestCandidateLib.getPath(), hostLib.getPath().filename(), containerHasLibsWithIncompatibleVersion); log("Successfully injected host's shared lib", libsarus::LogLevel::DEBUG); return; } log(areCompatible.second.get(), libsarus::LogLevel::INFO); auto containerLib = "/lib" / hostLib.getPath().filename(); - libsarus::mount::validatedBindMount(hostLib.getPath(), containerLib, userIdentity, rootfsDir); + libsarus::mount::validatedBindMount(hostLib.getPath(), containerLib, userIdentity, rootfsDir, 0, rootless); if (areCompatible.first) { createSymlinksInDynamicLinkerDefaultSearchDirs(containerLib, hostLib.getPath().filename(), containerHasLibsWithIncompatibleVersion); } else { @@ -296,7 +300,7 @@ void MpiHook::performBindMounts() const { auto devicesCgroupPath = boost::filesystem::path{}; for(const auto& mount : bindMounts) { - libsarus::mount::validatedBindMount(mount, mount, userIdentity, rootfsDir); + libsarus::mount::validatedBindMount(mount, mount, userIdentity, rootfsDir, 0, rootless); if (libsarus::filesystem::isDeviceFile(mount)) { if (devicesCgroupPath.empty()) { diff --git a/src/hooks/mpi/MpiHook.hpp b/src/hooks/mpi/MpiHook.hpp index acc5f82d..07287106 100644 --- a/src/hooks/mpi/MpiHook.hpp +++ b/src/hooks/mpi/MpiHook.hpp @@ -61,6 +61,7 @@ class MpiHook { void log(const boost::format& message, libsarus::LogLevel level) const; private: + bool rootless = false; libsarus::hook::ContainerState containerState; boost::filesystem::path rootfsDir; libsarus::UserIdentity userIdentity; diff --git a/src/libsarus/utility/mount.cpp b/src/libsarus/utility/mount.cpp index 521ff2bb..483fdb12 100644 --- a/src/libsarus/utility/mount.cpp +++ b/src/libsarus/utility/mount.cpp @@ -171,12 +171,22 @@ void validatedBindMount(const boost::filesystem::path& source, const boost::filesystem::path& destination, const UserIdentity& userIdentity, const boost::filesystem::path& rootfsDir, - const unsigned long flags) { + const unsigned long flags, + const bool rootless) { + // This function assumes to be run as the root user, since it needs privileges to perform bind mounts. + // However, it is necessary to distinguish whether we are running within a fully privileged context + // (i.e. real root, e.g. within an suid program) or within a rootless/unprivileged context (i.e. under a user namespace). + // In the first case, we need to switch to the user identity to avoid complications due to root_squashed network + // filesystems; in the second case, identity switching is not necessary, since the kernel will always resolve + // to our true (unprivileged) identity in the topmost user namespace. + // To solve this somewhat elegantly, if we run as rootless, we set the target identity for the switches + // to the id which we already have. process::switchIdentity() is a no-op when attempting to switch to the same identity. auto rootIdentity = UserIdentity{}; + auto targetIdentity = rootless ? rootIdentity : userIdentity; try { // switch to user identity to make sure user has access to the mount source - process::switchIdentity(userIdentity); + process::switchIdentity(targetIdentity); auto sourceReal = getValidatedMountSource(source); auto destinationReal = getValidatedMountDestination(destination, rootfsDir); @@ -199,10 +209,14 @@ void validatedBindMount(const boost::filesystem::path& source, filesystem::createFileIfNecessary(destinationReal, userIdentity.uid, userIdentity.gid); } - // switch to user filesystem identity to make sure we can access paths as root even on root_squashed filesystems - process::setFilesystemUid(userIdentity); + // If we are real root, switch to user filesystem identity (fsuid) to make sure we can access paths as root + // even on root_squashed filesystems. + // There is no dedicated way of retrieving the current fsuid, and calling setfsuid(-1) will immediately error out + // if we are rootless, because we have no CAP_SETUID. + // So in the end we have no way of telling whether we can no-op the fsuid switch, hence we have to use an explicit condition. + if (!rootless) process::setFilesystemUid(userIdentity); bindMount(sourceReal, destinationReal, flags); - process::setFilesystemUid(rootIdentity); + if (!rootless) process::setFilesystemUid(rootIdentity); } catch(Error& e) { // Restore root identity in case the exception happened while having a non-privileged id. diff --git a/src/libsarus/utility/mount.hpp b/src/libsarus/utility/mount.hpp index 0e241385..777e927c 100644 --- a/src/libsarus/utility/mount.hpp +++ b/src/libsarus/utility/mount.hpp @@ -37,7 +37,8 @@ void validatedBindMount(const boost::filesystem::path& source, const boost::filesystem::path& destination, const UserIdentity& userIdentity, const boost::filesystem::path& rootfsDir, - const unsigned long flags=0); + const unsigned long flags=0, + const bool rootless=false); void bindMount(const boost::filesystem::path& from, const boost::filesystem::path& to, unsigned long flags=0); void loopMountSquashfs(const boost::filesystem::path& image, const boost::filesystem::path& mountPoint); void mountOverlayfs(const boost::filesystem::path& lowerDir, diff --git a/src/libsarus/utility/process.cpp b/src/libsarus/utility/process.cpp index 4db61f27..5af1ac6a 100644 --- a/src/libsarus/utility/process.cpp +++ b/src/libsarus/utility/process.cpp @@ -58,6 +58,12 @@ void switchIdentity(const libsarus::UserIdentity& identity) { uid_t euid = geteuid(); uid_t egid = getegid(); + if (euid == identity.uid && egid == identity.gid) { + logMessage(boost::format{"Switching to the same identity. Ignoring"}, + LogLevel::DEBUG); + return; + } + if (euid == 0){ // unprivileged processes cannot call setgroups if (setgroups(identity.supplementaryGids.size(), identity.supplementaryGids.data()) != 0) {