From d13605a9a30b8c856433fd0bce50046f967e620f Mon Sep 17 00:00:00 2001 From: Norman Feske Date: Tue, 13 Aug 2013 12:31:43 +0200 Subject: [PATCH] Release notes for version 13.08 --- doc/release_notes-13-08.txt | 995 ++++++++++++++++++++++++++++++++++++ 1 file changed, 995 insertions(+) create mode 100644 doc/release_notes-13-08.txt diff --git a/doc/release_notes-13-08.txt b/doc/release_notes-13-08.txt new file mode 100644 index 00000000000..7e323144bdb --- /dev/null +++ b/doc/release_notes-13-08.txt @@ -0,0 +1,995 @@ + + + =============================================== + Release notes for the Genode OS Framework 13.08 + =============================================== + + Genode Labs + + + +The release of version 13.08 marks the 5th anniversary of the Genode OS +framework. We celebrate this anniversary with the addition of three major +features that we have much longed for, namely the port of Qt5 to Genode, +profound multi-processor support, and a light-weight event tracing +framework. Additionally, the new version comes with new device drivers for +SATA 3.0 and power management for the Exynos-5 SoC, improved virtualization +support on NOVA on x86, updated kernels, and integrity checks for +downloaded 3rd-party source code. + +Over the course of the past five years, Genode's development was primarily +motivated by adding and cultivating features to make the framework fit for as +many application areas as possible. Now that we have a critical mass of +features, the focus on mere functionality does not suffice anymore. The +question of what Genode can do ultimately turns into the question of how well +Genode can do something: How stable is a certain workload? How does networking +perform? How does it scale to multi-processor systems? Because we are lacking +concise answers to these kind of questions, we have to investigate. + +When talking about stability, our recently introduced automated testing +infrastructure makes us more confident than ever. Each night, over 200 +automated tests are performed, covering various kernels and several hardware +platforms. All those tests are publicly available in the form of so-called run +scripts and are under continues development. + +Regarding performance investigations, recently we have begun to benchmark +application performance focusing on network throughput. Interestingly, our +measurements reveal significant differences between the used kernels, but +also shortcomings in our software stack. For example, currently we see that +our version of lwIP performs poorly with gigabit networking. To thoroughly +investigate such performance issues, the current version adds support +for tracing the behaviour of Genode components. This will allow us to get a +profound understanding of all inter-component interaction that are on the +critical path for the performance of complex application-level workloads. +Thanks to the Genode architecture, we could come up with a strikingly simple, +yet powerful design for a tracing facility. Section +[Light-weight event tracing] explains how it works. + +When it comes to multi-processor scalability, we used to shy away from such +inquiries because, honestly, we haven't paid much consideration to it. This +view has changed by now. With the current release, we implemented the +management of CPU affinities right into the heart of the framework, i.e., +Genode's session concept. Additionally, we cracked a damn hard nut by enabling +Genode to use multiple CPUs on the NOVA hypervisor. This kernel is by far the +most advanced Open-Source microkernel for the x86 architecture. However, +NOVA's MP model seemed to inherently contradict with the API design of Genode. +Fortunately, we found a fairly elegant way to go forward and we're able to +tame the beast. Section [Enhanced multi-processor support] goes into more +detail. + +Functionality-wise, we always considered the availability of Qt on Genode as a +big asset. With the current release, we are happy to announce that we finally +made the switch from Qt4 to Qt5. Section [Qt5 available on all kernels] gives +insights into the challenges that we faced during porting work. + +In addition to those highlights, the new version comes with improvements all +over the place. To name a few, there are improved support for POSIX threads, +updated device drivers, an updated version of the Fiasco.OC kernel and +L4Linux, and new device drivers for Exynos-5. Finally, the problem of +verifying the integrity of downloaded 3rd-party source codes has been +addressed. + + +Qt5 available on all kernels +############################ + +Since its integration with Genode version 9.02, Qt4 is regarded as +one of the most prominent features of the framework. For users, combining Qt +with Genode makes the world of sophisticated GUI-based end-user applications +available on various microkernels. For Genode developers, Qt represents by far +the most complex work load natively executed on top of the framework API, +thereby stressing the underlying system in any way imaginable. We have been +keeping an eye on Qt version 5 for a while and highly anticipate the direction +where Qt is heading. We think that the time is right to leave Qt4 behind to +embrace Qt5 for Genode. + +For the time being, both Qt4 and Qt5 are available for Genode, but Qt4 is +declared as deprecated and will be removed with the upcoming version 13.11. +Since Qt5 is almost API compatible to Qt4, the migration path is relatively +smooth. So we recommend to move your applications over to Qt5 during the +next release cycle. + +In the following, we briefly describe the challenges we faced while adding Qt5 +support to Genode, point you to the place where to find Qt5 in the source +tree, and give quick-start instructions for getting a Qt5 application +scenario running. + +We found that the biggest architectural difference between version 4 and +version 5 is the use of the so-called Qt Platform Abstraction (QPA) interface, +which replaces the former Qt Window System (QWS). + + +Moving from QWS to QPA +====================== + +With Qt4, we relied on QWS +to perform the window handling. A Qt4 application used to create a session to +Genode's GUI server (called nitpicker) and applied its graphical output onto a +virtual framebuffer. The virtual framebuffer was not visible per se. To make +portions of the virtual framebuffer visible on screen, the application had to +create so-called nitpicker views. A view is a rectangular area of the physical +screen that displays a (portion of) the virtual framebuffer. The position, +size, and stacking order of views is managed by the application. For each Qt +window, the application would simply create a corresponding nitpicker view and +maintain the consistency of the view with the geometry of the Qt window. Even +though each Qt application seemingly operated as a full-screen application +with the windows managed by the application-local QWS, the use of nitpicker +views still allowed the integration of any number of Qt applications into one +windowed environment. + +With the advent of compositing window managers, the typical way of how an +application interacts with the window system of the OS changed. Whereas +old-generation GUIs relied on a tight interplay of the application with the +window server in order to re-generate newly exposed window regions whenever +needed (e.g., revealing the window content previously covered by another +window), the modern model of a GUI server keeps all pixels of all windows in +memory regardless of whether the window is visible or covered by other +windows. The use of one pixel buffer per window seems wasteful with respect to +memory usage when many large windows are overlapping each other. On the other +hand, this technique largely simplifies GUI servers and makes the +implementation of fancy effects, like translucent windows, straight forward. +Since memory is cheap, the Qt developers abandoned the old method and fully +embraced the buffer-per-window approach by the means of QPA. + +For Genode, we faced the challenge that we don't have a window server +in the usual sense. With nitpicker, we have a GUI server, but with a more +radical design. In particular, nitpicker leaves the management of window +geometries and stacking to the client. In contrast, QPA expects the window +system to provide both means for a user to interactively change the window +layout and a way for an application to define the properties (such as the +geometry, title, and visibility) of its windows. +The obviously missing piece was the software component that deals with window +controls. Fortunately, we already have a bunch of native nitpicker applications +that come with client-side window controls, in particular the so-called liquid +framebuffer (liquid_fb). This nitpicker client presents a virtual framebuffer +in form of a proper window on screen and, in turn, provides a framebuffer and +input service. These services can be used by other Genode processes, for +example, another nested instance of nitpicker. +This way, liquid_fb lends itself to be the interface between the nitpicker +GUI server and QPA. + +For each QPA window, the application creates a new liquid_fb instance as a +child process. The liquid_fb instance will request a dedicated nitpicker +session, which gets routed through the application towards the parent of the +application, which eventually routes the request to the nitpicker GUI server. +Finally, the liquid_fb instance announces its input and framebuffer services +to its parent, which happens to be the application. Now, the application is +able to use those services in order to access the window. Because the +liquid_fb instances are children of the application, the application can +impose control over those. In particular, it can update the liquid_fb +configuration including the window geometry and title at any time. Thanks to +Genode's dynamic reconfiguration mechanism, the liquid_fb instances are able +to promptly respond to such reconfigurations. + +Combined, those mechanisms give the application a way to receive user input +(via the input services provided by the liquid_fb instances), perform +graphical output (via the virtual framebuffers provided by the liquid_fb +instances), and define window properties (by dynamically changing the +respective liquid_fb configurations). At the same time, the user can use +liquid_fb's window controls to move, stack, and resize application windows as +expected. + +[image qt5_screenshot] + + +Steps of porting Qt5 +==================== + +Besides the switch to QPA, the second major change was related to the build +system. For the porting work, we use a Linux host system to obtain the +starting point for the build rules. The Qt4 build system would initially +generate all Makefiles, which could be inspected and processed at once. In +contrast, Qt5 generates Makefiles during the build process whenever needed. +When having configured Qt for Genode, however, the build on Linux will +ultimately fail. So the much-desired intermediate Makefiles won't be created. +The solution was to have 'configure' invoke 'qmake -r' instead of 'qmake'. +This way, qmake project files will be processed recursively. A few additional +tweaks were needed to avoid qmake from backing out because of missing +dependencies (qt5_configuration.patch). To enable the build of the Qt tools +out of tree, qmake-specific files had to be slightly adapted +(qt5_tools.patch). Furthermore, qtwebkit turned out to use code-generation +tools quite extensively during the build process. On Genode, we perform this +step during the 'make prepare' phase when downloading and integrating the Qt +source code with the Genode source tree. + +For building Qt5 on Genode, we hit two problems. First, qtwebkit depends on +the ICU (International Components for Unicode) library, which was promptly +ported and can be found in the libports repository. Second, qtwebkit +apparently dropped the support of the 'QThread' API in favor of +POSIX-thread support only. For this reason, we had to extend the coverage +of Genode's pthread library to fulfill the needs of qtwebkit. + +Once built, we entered the territory of debugging problems at runtime. +* We hit a memory-corruption problem caused by an assumption of 'QArrayData' + with regard to the alignment of memory allocated via malloc. As a + work-around, we weakened the assumptions to 4-byte alignment + (qt5_qarraydata.patch). +* Page faults in QWidgetAnimator caused by + use-after-free problems. Those could be alleviated by adding pointer + checks (qt5_qwidgetanimator.patch). +* Page faults caused by the slot function 'QWidgetWindow::updateObjectName()' + with a 'this' pointer of an incompatible type 'QDesktopWidget*'. + As a workaround, we avoid this condition by delegating the + 'QWidgetWindow::event()' that happened to trigger the slot method to + 'QWindow' (base class of 'QWidgetWindow') rather than to a + 'QDesktopWidget' member (qt5_qwidgetwindow.patch). +* We observed that Arora presented web sites incomplete, or including HTTP + headers. During the evaluation of HTTP data, a signal was sent to another + thread, which activated a "user provided download buffer" for optimization + purposes. On Linux, the receiving thread was immediately scheduled and + everything went fine. However, on some kernels used by Genode, scheduling + is different, so that the original thread continued to execute a bit longer, + ultimately triggering a race condition. As a workaround, we disabled the + "user provided download buffer" optimization. +* Page faults in the JavaScript engine of Webkit. The JavaScript + 'RegExp.exec()' function returned invalid string objects. We worked around + this issue by deactivating the JIT compiler for the processing of + regular expressions (ENABLE_YARR_JIT). + +The current state of the Qt5 port is fairly complete. It covers the core, gui, +jscore, network, script, scriptclassic, sql, ui, webcore, webkit, widgets, +wtf, and xml modules. That said, there are a few known limitations and +differences compared to Qt4. First, the use of one liquid_fb instance per +window consumes more memory compared to the use of QWS in Qt4. Furthermore, +external window movements are not recognized by our QPA implementation yet. +This can cause popup menus to appear at unexpected positions. Key repeat is +not yet handled. The 'QNitpickerViewWidget' is not yet adapted to Qt5. For this +reason, qt_avplay is not working yet. + + +Test drive +========== + +Unlike Qt4, which was hosted in the dedicated 'qt4' repository, Qt5 is +integrated in the libports repository. It can be downloaded and integrated +into the Genode build system by issuing 'make prepare' from within the +libports repository. The Qt5 versions of the known Qt examples are located at +libports/src/app/qt5. Ready-to-use run scripts for those examples are available +at libports/run. + + +Migration away from Qt4 to Qt5 +============================== + +The support for Qt4 for Genode has been declared as deprecated. By default, +it's use is inhibited to avoid name aliasing problems between both versions. +Any attempt to build a qt4-based target will result in a message: +!Skip target app/qt_launchpad because it requires qt4_deprecated +To re-enable the use of Qt4, the SPEC value qt4_deprecated must be defined +manually for the build directory: +!echo "SPECS += qt4_deprecated" >> etc/specs.conf + +We will keep the qt4 repository in the source tree during the current +release cycle. It will be removed with version 13.11. + + +Light-weight event tracing +########################## + +With Genode application scenarios getting increasingly sophisticated, +the need for thorough performance analysis has come into spotlight. +Such scenarios entail the interaction of many components. +For example, with our recent work on optimizing network performance, we +have to consider several possible attack points: + +* Device driver: Is the device operating in the proper mode? Are there + CPU-intensive operations such as allocations within the critical path? +* Interface to the device driver: How frequent are context switches between + client and device driver? Is the interface designed appropriately for + the access patterns? +* TCP/IP stack: How does the data flow from the raw packet level to the + socket level? How dominant are synchronization costs between the involved + threads? Are there costly in-band operations performed, e.g., dynamic + memory allocations per packet? +* C runtime: How does integration of the TCP/IP stack with the C runtime + work, for example how does the socket API interfere with timeout + handling during 'select' calls? +* Networking application +* Timer server: How often is the timer consulted by the involved components? + What is the granularity of timeouts and thereby the associated costs for + handling them? +* Interaction with core: What is the profile of the component's interaction + with core's low-level services? + +This example is just an illustration. Most real-world performance-critical +scenarios have a similar or even larger scope. With our traditional +tools, it is hardly possible to gather a holistic view of the scenario. Hence, +finding performance bottlenecks tends to be a series of hit-and-miss +experiments, which is a tiresome and costly endeavour. + +To overcome this situation, we need the ability to gather traces of component +interactions. Therefore, we started investigating the design of a tracing +facility for Genode one year ago while posing the following requirements: + +* Negligible impact on the performance, no side effects: + For example, performing a system call per traced event + is out of question because this would severely influence the flow of + control (as the system call may trigger the kernel to take a scheduling + decision) and the execution time of the traced code, not to speak of + the TLB and cache footprint. + +* Kernel independence: We want to use the same tracing facility across + all supported base platforms. + +* Accountability of resources: It must be clearly defined where the + resources for trace buffers come from. Ideally, the tracing tool should be + able to dimension the buffers according to its needs and, in turn, pay for + the buffers. + +* Suitable level of abstraction: Only if the trace contains information at + the right level of abstraction, it can be interpreted for large scenarios. + A counter example is the in-kernel trace buffer of the Fiasco.OC kernel, + which logs kernel-internal object names and a few message words when tracing + IPC messages, but makes it almost impossible to map this low-level + information to the abstraction of the control flow of RPC calls. In + contrast, we'd like to capture the names of invoked RPC calls (which is an + abstraction level the kernel is not aware of). This requirement implies the + need to have useful trace points generated automatically. Ideally those + trace points should cover all interactions of a component with the outside + world. + +* (Re-)definition of tracing policies at runtime: The + question of which information to gather when a trace point is passed + should not be solely defined at compile time. Instead of changing static + instrumentations in the code, we'd prefer to have a way to configure + the level of detail and possible conditions for capturing events at runtime, + similar to dtrace. This way, a series of different hypotheses could be + tested by just changing the tracing policy instead of re-building and + rebooting the entire scenario. + +* Straight-forward implementation: We found that most existing tracing + solutions are complicated. For example, dtrace comes with a virtual + machine for the sandboxed interpretation of policy code. Another typical + source of complexity is the synchronization of trace-buffer accesses. + Because for Genode, low TCB complexity is of utmost importance, the + simplicity of the implementation is the prerequisite to make it an + integral part of the base system. + +* Support for both online and offline analysis of traces. + +We are happy to report to have come up with a design that meets all those +requirements thanks to the architecture of Genode. In the following, we +present the key aspects of the design. + +The tracing facility comes in the form of a new TRACE service implemented +in core. Using this service, a TRACE client can gather information about +available tracing subjects (existing or no-longer existing threads), +define trace buffers and policies and assign those to tracing subjects, +obtain access to trace-buffer contents, and control the tracing state +of tracing subjects. When a new thread is created via a CPU session, the +thread gets registered at a global registry of potential tracing sources. Each +TRACE service manages a session-local registry of so-called trace subjects. +When requested by the TRACE client, it queries new tracing sources from the +source registry and obtains references to the corresponding threads. This way, +the TRACE session becomes able to control the thread's tracing state. + +To keep the tracing overhead as low as possible, we assign a separate trace +buffer to each individually traced thread. The trace buffer is a shared memory +block mapped to the virtual address space of the thread's process. Capturing +an event comes down to a write operation into the thread-local buffer. Because +of the use of shared memory for the trace buffer, no system call is needed and +because the buffer is local to the traced thread, there is no need for +synchronizing the access to the buffer. When no tracing is active, a thread +has no trace buffer. The buffer gets installed only when tracing is started. +The buffer is not installed magically from the outside of the traced process +but from the traced thread itself when passing a trace point. To detect +whether to install a new trace buffer, there exists a so-called trace-control +dataspace shared between the traced process and its CPU service. This +dataspace contains control bits for each thread created via the CPU session. +The control bits are evaluated each time a trace point is passed by the +thread. When the thread detects a change of the tracing state, it actively +requests the new trace buffer from the CPU session and installs it into its +address space. The same technique is used for loading the code for tracing +policies into the traced process. The traced thread actively checks for +policy-version updates by evaluating the trace-control bits. If an update is +detected, the new policy code is requested from the CPU session. The policy +code comes in the form of position-independent code, which gets mapped into +the traced thread's address space by the traced thread itself. Once mapped, +a trace point will call the policy code. When called, the policy +module code returns the data to be captured into the trace buffer. The +relationship between the trace monitor (the client of TRACE service), core's +TRACE service, core's CPU service, and the traced process is depicted in +Figure [trace_control]. + +[image trace_control] + +There is one trace-control dataspace per CPU session, which gets accounted +to the CPU session's quota. The resources needed for the +trace-buffer dataspaces and the policy dataspaces are paid-for by the +TRACE client. On session creation, the TRACE client can specify the amount +of its own RAM quota to be donated to the TRACE service in core. This +enables the TRACE client to define trace buffers and policies of arbitrary +sizes, limited only by its own RAM quota. + +In Genode, the interaction of a process with its outside world is +characterized by its use of inter-process communication, namely synchronous +RPC, signals, and access to shared memory. For the former two types of +inter-process communication, Genode generates trace points automatically. RPC +clients generate trace points when an RPC call is issued and when a call +returned. RPC servers generate trace points in the RPC dispatcher, capturing +incoming RPC requests as well as RPC replies. Thanks to Genode's RPC +framework, we are able to capture the names of the RPC functions in the RPC +trace points. This information is obtained from the declarations of the RPC +interfaces. For signals, trace points are generated for submitting and +receiving signals. Those trace points form a useful base line for gathering +tracing data. In addition, manual trace points can be inserted into the code. + + +State of the implementation +=========================== + +The implementation of Genode's tracing facility is surprisingly low complex. +The addition to the base system (core as well as the base library) are +merely 1500 lines of code. The mechanism works across all base platforms. + +Because the TRACE client provides the policy code and trace buffer to the +traced thread, the TRACE client imposes ultimate control over the traced +thread. In contrast to dtrace, which sandboxes the trace policy, we express +the policy module in the form of code executed in the context of the traced +thread. However, in contrast to dtrace, such code is never loaded into a large +monolithic kernel, but solely into the individually traced processes. So the +risk of a misbehaving policy is constrained to the traced process. + +In the current form, the TRACE service of core should be considered as a +privileged service because the trace-subject namespace of each session +contains all threads of the system. Therefore, TRACE sessions should be routed +only for trusted processes. In the future, we plan to constrain the +namespaces for tracing subjects per TRACE session. + +The TRACE session interface is located at base/include/trace_session/. +A simple example for using the service is available at os/src/test/trace/ +and is accompanied with the run script os/run/trace.run. The test +demonstrates the TRACE session interface by gathering a trace of a thread +running locally in its address space. + + +Enhanced multi-processor support +################################ + +Multi-processor (MP) support is one of those features that most users take for +granted. MP systems are so ubiquitous, even on mobile platforms, that a +limitation to utilizing a single CPU only is almost a fallacy. +That said, MP support in operating systems is hard to get right. For this +reason, we successively deferred the topic on the agenda of Genode's +road map. + +For some base platforms such as the Linux or Codezero kernels, Genode +always used to support SMP because the kernel would manage the affinity +of threads to CPU cores transparently to the user-level process. So on +these kernels, there was no need to add special support into the framework. + +However, on most microkernels, the situation is vastly different. The +developers of such kernels try hard to avoid complexity in the kernel and +rightfully argue that in-kernel affinity management would contribute to kernel +complexity. Another argument is that, in contrast to monolithic kernels that +have a global view on the system and an "understanding" of the concerns of the +user processes, a microkernel is pretty clueless when it comes to the roles +and behaviours of individual user-level threads. Not knowing whether a thread +works as a device driver, an interactive program, or a batch process, the +microkernel is not in the position to form a reasonably useful model of the +world, onto which it could intelligently apply scheduling and affinity +heuristics. In fact, from the perspective of a microkernel, each thread does +nothing else than sending and receiving messages, and causing page faults. + +For these reasons, microkernel developers tend to provide the bootstrapping +procedure for the physical CPUs and a basic mechanism to assign +threads to CPUs but push the rest of the problem to the user space, i.e., +Genode. The most straight-forward way would make all physical CPUs visible +to all processes and require the user or system integrator to assign +physical CPUs when a thread is created. However, on the recursively +structured Genode system, we virtualize resources at each level, which +calls for a different approach. Section [Management of CPU affinities] +explains our solution. + +When it comes to inter-process communication on MP systems, there is a +certain diversity among the kernel designs. Some kernels allow the user land +to use synchronous IPC between arbitrary threads, regardless of whether both +communication partners reside on the same CPU or on two different CPUs. This +convenient model is provided by Fiasco.OC. However, other kernels do not offer +any synchronous IPC mechanism across CPU cores at all, NOVA being a poster +child of this school of thought. If a user land is specifically designed for +a particular kernel, those peculiarities can be just delegated to the +application developers. For example, the NOVA user land called NUL is designed +such that a recipient of IPC messages spawns a dedicated thread on each +physical CPU. In contrast, Genode is meant to provide a unified API that +works well across various different kernels. To go forward, we had four +options: + +# Not fully supporting the entity of API semantics across all base platforms. + For example, we could stick with the RPC API for synchronous communication + between threads. Programs just would happen to fail on some base platforms + when the called server resides on a different CPU. This would effectively + push the problem to the system integrator. The downside would be the + sacrifice of Genode's nice feature that a program developed + on one kernel usually works well on other kernels without any changes. + +# Impose the semantics provided by the most restrictive kernel onto all + users of the Genode API. Whereas this approach would facilitate that + programs behave consistently across all base platforms, the restrictions + would be artificially imposed onto all Genode users, in particular the + users of kernels with less restrictions. Of course, we don't change + the Genode API lightheartedly, which attributes to our hesitance to go + into this direction. + +# Hiding kernel restrictions behind the Genode API. This approach could come + in many different shapes. For example, Genode could transparently spawn a + thread on each CPU when a single RPC entrypoint gets created, following + the model of NUL. Or Genode could emulate synchronous IPC using the core + process as a proxy. + +# Adapting the kernel to the requirements of Genode. That is, persuading + kernel developers to implement the features we find convenient, i.e., adding + a cross-CPU IPC feature to NOVA. History shows that our track record in + doing that is not stellar. + +Because each of those options is like opening a different can of worms, we +used to defer the solution of the problem. Fortunately, however, we finally +kicked off a series of practical experiments, which led to a fairly elegant +solution, which is detailed in Section +[Adding multi-processor support to Genode on NOVA]. + + +Management of CPU affinities +============================ + +In line with our experience of supporting +[http://www.genode.org/documentation/release-notes/10.02#Real-time_priorities - real-time priorities] +in version 10.02, we were seeking a way to express CPU affinities such that +Genode's recursive nature gets preserved and facilitated. Dealing with +physical CPU numbers would contradict with this mission. Our solution +is based on the observation that most MP systems have topologies that can +be represented on a two-dimensional coordinate system. CPU nodes close +to each other are expected to have closer relationship than distant nodes. +In a large-scale MP system, it is natural to assign clusters of closely +related nodes to a given workload. Genode's architecture is based on the +idea to recursively virtualize resources and thereby lends itself to the +idea to apply this successive virtualization to the problem of clustering +CPU nodes. + +In our solution, each process has a process-local view on a so-called affinity +space, which is a two-dimensional coordinate space. If the process creates a +new subsystem, it can assign a portion of its own affinity space to the new +subsystem by imposing a rectangular affinity location to the subsystem's CPU +session. Figure [affinity_space] illustrates the idea. + +[image affinity_space] + +Following from the expression of affinities as a rectangular location within a +process-local affinity space, the assignment of subsystems to CPU nodes +consists of two parts, the definition of the affinity space dimensions as used +for the process and the association of sub systems with affinity locations +(relative to the affinity space). For the init process, the affinity space is +configured as a sub node of the config node. For example, the following +declaration describes an affinity space of 4x2: + +! +! ... +! +! ... +! + +Subsystems can be constrained to parts of the affinity space using the +'' sub node of a '' entry: + +! +! ... +! +! +! ... +! +! ... +! + +As illustrated by this example, the numbers used in the declarations for this +instance of the init process are not directly related to physical CPUs. If +the machine has just two cores, init's affinity space would be mapped +to the range [0,1] of physical CPUs. However, in a machine +with 16x16 CPUs, the loader would obtain 8x8 CPUs with the upper-left +CPU at position (4,0). Once a CPU session got created, the CPU client can +request the physical affinity space that was assigned to the CPU session +via the 'Cpu_session::affinity()' function. Threads of this CPU session +can be assigned to those physical CPUs via the 'Cpu_session::affinity()' +function, specifying a location relative to the CPU-session's affinity space. + + +Adding multi-processor support to Genode on NOVA +================================================ + +The NOVA kernel has been supporting MP systems for a long time. However +Genode did not leverage this capability until now. The main reason was that +the kernel does not provide - intentionally by the kernel developer - the +possibility to perform synchronous IPC between threads residing on different +CPUs. + +To cope with this situation, Genode servers and clients would need to make +sure to have at least one thread on a common CPU in order to communicate. +Additionally, shared memory and semaphores could be used to communicate across +CPU cores. Both options would require rather fundamental changes to the Genode +base framework and the API. An exploration of this direction should in any +case be pursued in evolutionary steps rather than as one big change, also +taking into account that other kernels do not impose such hard requirements on +inter-CPU communication. To tackle the challenge, we conducted a series of +experiments to add some kind of cross-CPU IPC support to Genode/NOVA. + +As a general implication of the missing inter-CPU IPC, messages between +communication partners that use disjoint CPUs must take an indirection through +a proxy process that has threads running on both CPUs involved. The sender +would send the message to a proxy thread on its local CPU, the proxy process +would transfer the message locally to the CPU of the receiver by using +process-local communication, and the proxy thread on the receiving CPU would +deliver the message to the actual destination. We came up with three options +to implement this idea prototypically: + +# Core plays the role of the proxy because it naturally has access to all + CPUs and emulates cross-CPU IPC using the thread abstractions of the + Genode API. +# Core plays the role of the proxy but uses NOVA system calls directly + rather than Genode's thread abstraction. +# The NOVA kernel acts as the proxy and emulates cross-CPU IPC directly + in the kernel. + +After having implemented the first prototypes, we reached the following +conclusions. + +For options 1 and 2 where core provides this service: If a client can not +issue a local CPU IPC, it asks core - actually the pager of the client +thread - to perform the IPC request. Core then spawns or reuses a proxy thread +on the target CPU and performs the actual IPC on behalf of the client. Option +1 and 2 only differ in respect to code size and the question to whom to +account the required resources - since a proxy thread needs a stack and some +capability selectors. + +As one big issue for option 1 and 2, we found that in order to delegate +capabilities during the cross-CPU IPC, core has to receive capability mappings +to delegate them to the target thread. However, core has no means to know +whether the capabilities must be maintained in core or not. If a capability is +already present in the target process, the kernel would just translate the +capability to the target's capability name space. So core wouldn't need to +keep it. In the other case where the target receives a prior unknown +capability, the kernel creates a new mapping. Because the mapping gets +established by the proxy in core, core must not free the capability. +Otherwise, the mapping would disappear in the target process. This means that +the use of core as a proxy ultimately leads to leaking kernel resources +because core needs to keep all transferred capabilities, just for the case a +new mapping got established. + +For option 3, the same general functionality as for option 1 and 2 is +implemented in the kernel instead of core. If a local CPU IPC call +fails because of a BAD_CPU kernel error code, the cross-CPU IPC extension +will be used. The kernel extension creates - similar as to option 1 and 2 - a +semaphore (SM), a thread (EC), and a scheduling context (SC) on the remote +CPU and lets it run on behalf of the caller thread. The caller thread +gets suspended by blocking on the created semaphore until the remote EC has +finished the IPC. The remote proxy EC reuses the UTCB of the suspended caller +thread as is and issues the IPC call. When the proxy EC returns, it wakes up +the caller via the semaphore. Finally, the proxy EC and SC de-schedule +themselves and the resources get to be destroyed later on by the kernel's RCU +mechanism. Finally, when the caller thread got woken up, it takes care to +initiate the deconstruction of the semaphore. + +The main advantage of option 3 compared to options 1 and 2 is that we don't +have to keep and track the capability delegations during a cross-CPU IPC. +Furthermore, we do not have potentially up to two additional address space +switches per cross-CPU IPC (from client to core and core to the server). +Additionally, the UTCB of the caller is reused by the proxy EC and does not +need to be allocated separately as for option 1 and 2. + +For these reasons, we decided to go for the third option. From Genode's API +point of view, the use of cross-CPU IPC is completely transparent. Combined +with the affinity management described in the previous section, Genode/NOVA +just works on MP systems. +As a simple example for using Genode on MP systems, there is a ready-to-use +run script available at base/run/affinity.run. + + +Base framework +############## + +Affinity propagation in parent, root, and RPC entrypoint interfaces +=================================================================== + +To support the propagation of CPU affinities with session requests, the +parent and root interfaces had to be changed. The 'Parent::Session' +and 'Root::Session' take the affinity of the session as a new argument. +The affinity argument contains both the dimensions of the affinity space +used by the session and the session's designated affinity location within +the space. The corresponding type definitions can be found at +base/affinity.h. + +Normally, the 'Parent::Session' function is not used directly but indirectly +through the construction of a so-called connection object, which represents an +open session. For each session type there is a corresponding connection type, +which takes care of assembling the session-argument string by using the +'Connection::session()' convenience function. To maintain API compatibility, +we kept the signature of the existing 'Connection::session()' function using a +default affinity and added a new overload that takes the affinity as +additional argument. Currently, this overload is used in +cpu_session/connection.h. + +For expressing the affinities of RPC entrypoints to CPUs within the affinity +space of the server process, the 'Rpc_entrypoint' takes the desired affinity +location of the entrypoint as additional argument. For upholding API +compatibility, the affinity argument is optional. + + +CPU session interface +===================== + +The CPU session interface underwent changes to accommodate the new event +tracing infrastructure and the CPU affinity management. + +Originally the 'Cpu_session::num_cpus()' function could be used to +determine the number of CPUs available to the session. This function +has been replaced by the new 'affinity_space' function, which returns the +bounds of the CPU session's physical affinity space. In the simplest case of +an SMP machine, the affinity space is one-dimensional where the width +corresponds to the number of CPUs. The 'affinity' function, which is used to +bind a thread to a specified CPU, has been changed to take an affinity +location as argument. This way, the caller can principally express the +affiliation of the thread with multiple CPUs to guide load-balancing in a CPU +service. + + +New TRACE session interface +=========================== + +The new event tracing mechanism as described in Section +[Light-weight event tracing] is exposed to Genode processes in the form +of the TRACE service provided by core. The new session interface +is located under base/include/trace_session/. In addition to the new session +interface, the CPU session interface has been extended with functions for +obtaining the trace-control dataspace for the session as well as the trace +buffer and trace policy for a given thread. + + +Low-level OS infrastructure +########################### + +Event-driven operation of NIC bridge +==================================== + +The NIC bridge component multiplexes one physical network device among +multiple clients. It enables us to multiplex networking on the network-packet +level rather than the socket level and thereby take TCP/IP out of the +critical software stack for isolating network applications. As it +represents an indirection in the flow of all networking packets, its +performance is important. + +The original version of NIC bridge was heavily multi-threaded. In addition to +the main thread, a timer thread, and a thread for interacting with the NIC +driver, it employed one dedicated thread per client. By merging those flows of +control into a single thread, we were able to significantly reduce the number +of context switches and improve data locality. These changes reduced the +impact of the NIC bridge on the packet throughput from 25% to 10%. + + +Improved POSIX thread support +============================= + +To accommodate qtwebkit, we had to extend Genode's pthread library with +working implementations of condition variables, mutexes, and thread-local +storage. The implemented functions are attr_init, attr_destroy, attr_getstack, +attr_get_np, equal, mutex_attr, mutexattr_init, mutexattr_destroy, +mutexattr_settype, mutex_init, mutex_destroy, mutex_lock, mutex_unlock, +cond_init, cond_timedwait, cond_wait, cond_signal, cond_broadcast, key_create, +setspecific, and getspecific. + + +Device drivers +############## + +SATA 3.0 on Exynos 5 +==================== + +The previous release featured the initial version of our SATA 3.0 driver for +the Exynos 5 platform. This driver located at os/src/drivers/ahci/exynos5 has +reached a fully functional state by now. It supports UDMA-133 with up to +6 GBit/s. + +For driver development, we set the goal to reach a performance equal to +the Linux kernel. To achieve that goal, we had to make sure to +operate the controller and the disks in the same ways as Linux does. +For this reason, we modeled our driver closely after the behaviour of the +Linux driver. That is, we gathered traces of I/O transactions to determine the +initialization steps and the request patterns that Linux performs to access +the device, and used those behavioral traces as a guide for our +implementation. Through step-by-step analysis of the traces, we not only +succeeded to operate the device in the proper modes, but we also found +opportunities for further optimization, in particular regarding the error +recovery implementation. + +This approach turned out to be successful. We measured that our driver +generally operates as fast (and in some cases even significantly faster) +than the Linux driver on solid-state disks as well as on hard disks. + + +Dynamic CPU frequency scaling for Exynos 5 +========================================== + +As the Samsung Exynos-5 SoC is primarily targeted at mobile platforms, +power management is an inherent concern. Until now, Genode did not pay +much attention to power management though. For example, we completely +left out the topic from the scope of the OMAP4 support. With the current +release, we took the first steps towards proper power management on ARM-based +platforms in general, and the Exynos-5-based Arndale platform in particular. + +First, we introduced a general interface to regulate clocks and voltages. +Priorly, each driver did its own part of configuring clock and power control +registers. The more device drivers were developed, the higher were the chances +that they interfere when accessing those clock, or power units. +The newly introduced "Regulator" interface provides the possibility to enable +or disable, and to set or get the level of a regulator. A regulator might be +a clock for a specific device (such as a CPU) or a voltage regulator. +For the Arndale board, an exemplary implementation of the regulator interface +exists in the form of the platform driver. It can be found at +os/src/drivers/platform/arndale. Currently, the driver implements +clock regulators for the CPU, the USB 2.0 and USB 3.0 host controller, the +eMMC controller, and the SATA controller. Moreover, it provides power +regulators for SATA, USB 2.0, and USB 3.0 host controllers. The selection of +regulators is dependent on the availability of drivers for the platform. +Otherwise it wouldn't be possible to test that clock and power state doesn't +affect the device. + +Apart from providing regulators needed by certain device drivers, we +implemented a clock regulator for the CPU that allows changing the CPU +frequency dynamically and thereby giving the opportunity to scale down +voltage and power consumption. The possible values range from 200 MHz to +1.7 GHz whereby the last value isn't recommended and might provoke system +crashes due to overheating. When using Genode's platform driver for Arndale +it sets CPU clock speed to 1.6 GHz by default. When reducing +the clock speed to the lowest level, we observed a power consumption +reduction of approximately 3 Watt. Besides reducing dynamic power consumption +by regulating the CPU clock frequency, we also explored the gating of the clock +management and power management to further reduce power consumption. + +With the CPU frequency scaling in place, we started to close all clock gates +not currently in use. When the platform driver for the Arndale board gets +initialized, it closes everything. If a device driver enables its clock +regulator, all necessary clock gates for the device's clock are opened. This +action saves about 0.7 Watt. The initial closing of all unnecessary power +gates was much more effective. Again, everything not essential for the working +of the kernel is disabled on startup. When a driver enables its power +regulator, all necessary power gates for the device are opened. Closing all +power gates saves about 2.6 Watt. + +If we consider all measures taken to save power, we were able to reduce power +consumption to about 59% without performance degradation. When measuring power +consumption after boot up, setting the CPU clock to 1.6 GHz, and fully load +both CPU cores without the described changes, we measured about 8 Watt. With +the described power saving provisions enabled, we measured about 4.7 Watt. +When further reducing the CPU clock frequency to 200 MHz, only 1.7 Watt were +measured. + + +VESA driver moved to libports +============================= + +The VESA framebuffer driver executes the initialization code located +in VESA BIOS of the graphics card. As the BIOS code is for real mode, +the driver uses the x86emu library from X11 as emulation environment. +We updated x86emu to version 1.20 and moved the driver from the 'os' +repository to the 'libports' repository as the library is third-party +code. Therefore, if you want to use the driver, the 'libports' +repository has to be prepared +('make -C /libports prepare PKG=x86emu') and enabled in +your 'etc/build.conf'. + + +Runtime environments +#################### + +Seoul (aka Vancouver) VMM on NOVA +================================= + +Since we repeatedly received requests for using the Seoul respectively +Vancouver VMM on NOVA, we improved the support for this virtualization +solution on Genode. Seoul now supports booting from raw hard disk images +provided via Genode's block session interface. Whether this image is actually +a file located in memory, or it is coming directly from the hard disk, or just +from a partition of the hard disk using Genode's part_blk service, is +completely transparent thanks to Genode's architecture. + +Additionally, we split up the one large Vancouver run script into several +smaller Seoul run scripts for easier usage - e.g. one for disk, one for +network testing, one for automated testing, and one we call "fancy". The +latter resembles the former vancouver.run script using Genode's GUI to let the +user start VMs interactively. The run scripts prefixed with 'seoul-' can be +found at ports/run. For the fancy and network scripts, ready-to-use VM images +are provided. Those images are downloaded automatically when executing the run +script for the first time. + + +L4Linux on Fiasco.OC +==================== + +L4Linux has been updated from version 3.5.0 to Linux kernel version 3.9.0 thus +providing support for contemporary user lands running on top of L4Linux on both +x86 (32bit) and ARM platforms. + + +Noux runtime for Unix software +============================== + +Noux is our way to use the GNU software stack natively on Genode. To improve +its performance, we revisited the address-space management of the runtime to +avoid redundant revocations of memory mappings when Noux processes are cleaned +up. + +Furthermore, we complemented the support for the Genode tool chain to +cover GNU sed and GNU grep as well. Both packages are available at the ports +repository. + + +Platforms +######### + +Fiasco.OC updated to revision r56 +================================= + +Fiasco.OC and the required L4RE parts have been updated to the current SVN +revision (r56). For us, the major new feature is the support of Exynos SOCs in +the mainline version of Fiasco.OC (www.tudos.org). Therefore Genode's +implementation of the Exynos5250 platform could be abandoned leading to less +maintenance overhead of Genode on Fiasco.OC. + +Furthermore, Genode's multi-processor support for this kernel has been +improved so that Fiasco.OC users benefit from the additions described in +Section [Enhanced multi-processor support]. + + +NOVA updated +============ + +In the process of our work on the multi-processor support on NOVA, we updated +the kernel to the current upstream version. Additionally, our customized branch +(called r3) comes with the added cross-CPU IPC system call and improvements +regarding the release of kernel resources. + + +Integrity checks for downloaded 3rd-party software +################################################## + +Even though Genode supports a large variety of 3rd-party software, its +source-code repository contains hardly any 3rd-party source code. Whenever +3rd-party source code is needed, Genode provides automated tools for +downloading the code and integrating it with the Genode environment. As of +now, there exists support for circa 70 software packages, including the +tool chain, various kernels, libraries, drivers, and a few applications. Of +those packages, the code for 13 packages comes directly from their respective +Git repositories. The remaining 57 packages are downloaded in the form of tar +archives from public servers via HTTP or FTP. Whereas we are confident with +the integrity of the code that comes from Git repositories, we are less so +about the archives downloaded from HTTP or FTP servers. + +Fortunately, most Open-Source projects provide signature files that allow +the user to verify the origin of the archive. For example, archives of +GNU software are signed with the private key of the GNU project. So the +integrity of the archive can be tested with the corresponding public key. +We used to ignore the signature files for many years but +this has changed now. If there is a signature file available for a package, +the package gets verified right after downloading. If only a hash-sum file +is provided, we check it against a known-good hash sum. + +The solution required three steps, the creation of tools for validating +signatures and hashes, the integration of those tools into Genode's +infrastructure for downloading the 3rd-party code, and the definition of +verification rules for the individual packages. + +First, new tools for downloading and validating hash sums and signatures were +added in the form of the shell scripts download_hashver (verify hash sum) and +download_sigver (verify signature) found at the tool/ directory. Under the +hood, download_sigver uses GNU GPG, and download_hashver uses the tools +md5sum, sha1sum, and sha256sum provided by coreutils. + +Second, hooks for invoking the verification tools were added to the +tool-chain build script as well as the ports and the libports repositories. + +The third and the most elaborative step, was going through all the packages, +looking for publicly available signature files, and adding corresponding +package rules. As of now, this manual process has been carried out for 30 +packages, thereby covering the half of the archives. + +Thanks to Stephan Mueller for pushing us into the right direction, kicking off +the work on this valuable feature, and for the manual labour of revisiting all +the 3rd-party packages! +