Skip to content
This repository has been archived by the owner on Jul 17, 2024. It is now read-only.

Latest commit

 

History

History
56 lines (39 loc) · 6.99 KB

README.md

File metadata and controls

56 lines (39 loc) · 6.99 KB

Design overview of the model and nodes libraries

Intelligent machine-learned pipelines make the predictions and the statistical inferences needed to build smart devices. The Embedded Learning Library (ELL) is a software library that enables its user to design these pipelines and to deploy them onto embedded platforms. ELL is written in modern C++, with APIs in Python and Javascript.

This document provides a high level technical description of ELL's design and functionality.

Model

An ELL model represents a complete realtime data processing pipeline, which can be compiled and deployed onto an embedded processor, such as a small ARM microcontroller. The model includes all of the signal processing, feature extraction, prediction, and inference steps that are required by the application. The input to the model is typically a realtime time-series of values generated by sensors, such as a video cameras, accelerometers, etc. The output of the model is an event source, generating events that are consumed by the main application logic of the embedded system.

The ELL framework provides two distinct types of functionality:

  • The ability to design and author models, including the machine learning ability to train predictors.
  • The ability to compile and deploy models onto embedded platforms.

We illustrate these ideas with a concrete example. Imagine designing the internal logic of a fitness-tracking bracelet for swimmers. Assume that the core functionality of the bracelet is to detect when the user is swimming, to classify the stroke (crawl, breast, butterfly, back), and to count strokes and laps. The hardware used to build such a device would likely include an inertial sensor and a low-power microcontroller, like an ARM Cortex M0. The user would use ELL (say, on their laptop computer) to create a model and to compile it for the Cortex M0 target. The model would take the raw sensor values (say, a three dimensional time series of 14-bit integer values at 50Hz), apply an appropriately designed band-pass filter, extract time and frequency domain features, and run a machine-learned stroke classifier. In parallel, the system could run additional classifiers that detect diving, wall push-offs, turns, and underwater dolphin kicking. All of this functionality would be encompassed by a single model. The model would generate events such as crawlStroke and dolphinKick. Any subsequent logic, such as counting strokes or taking action based on the detected stroke, would be external to the model and would not be handled by ELL.

The code that implements the model functionality is found in the libraries/model folder, under the model namespace.

Nodes

Technically, a model is a directed acyclic graph of computation units called nodes. There are three different types of nodes in the model:

  • Functional nodes represent a unit of computation, with a pre-defined set of inputs and outputs - each input must be connected to the output of another node.
  • Source nodes represent sensors and other external signal sources. Source nodes have predefined outputs, but no inputs.
  • Sink nodes generate events that are consumed by application logic that is external to ELL. Sink nodes have predefined inputs (which must be connected to the output of another node), but no outputs.

Some types of functional nodes perform complex operations, like computing an FFT or calculating the output of a Neural Network, while other types perform elementary operations, like addition or branching. The variety of node complexity is key to our design, and its importance will become clear when we describe the process of model refinement. Some functional node types maintain an internal state that can change each time the node receives an input.

Examples of implemented node types
  • LinearPredictorNode - implements a linear classifier or regressor
  • ForestNode - implements an ensemble of decision trees, a.k.a. a decision forest
  • MovingAverageNode - computes a moving average; note that this node is stateful
  • DelayNode - buffers its input and relays it as output after a constant delay; note that this node is stateful
  • DotProductNode - performs a dot product of two vectors
  • L2NormSquaredNode - computes the Euclidean norm of a vector and squares it
  • BinaryOperationNode - takes two arrays of real numbers and performs coordinate-wise addition, multiplication, or a variety of other scalar operations
  • ConstantNode - outputs a constant

The different node types are implemented in the libraries/nodes folder, in the nodes namespace

Ports

The inputs and outputs of a node are represented by its input ports and output ports. Each port has a type (real, integer, categorical, bool) and a size (similar to the size of an array). For example, a source node that represents an 3-axis accelerometer has no input ports and a single real-valued output port of size 3. An FFT functional node has one real-valued size-n input port (where n is a user defined parameter), and two real-valued size-n output ports, one for magnitude and the other for phase. Each input port draws its values from one or more output ports of other nodes: input ports have the ability to extract individual elements from output ports and to concatenate them.

Compilation and Deployment

ELL can cross-compile models down to the machine language of different supported target processors. It does so in three stages:

  • Model refinement.
  • Emitting the refined model as LLVM-IR. LLVM is an open cross-compiler architecture and LLVM-IR is an intermediate program representation used by LLVM.
  • Invoking the LLVM compiler to compile LLVM-IR to the target platform.

Emitting IR versus Refinement

Some nodes have the ability to emit LLVM-IR code that implements their functionality. These nodes are called compilable nodes. Other nodes are too complex, and instead have the ability to implement their functionality using a combination of finer-grained nodes, through a process called node refinement. The model refines its non-compilable nodes recursively, until all of its nodes become compilable. Some nodes have both the ability to emit LLVM-IR and the ability to refine, and ELL chooses between the two options according to the node parameters and the capabilities of the target platform.

We illustrate this process with an example. Imagine that we start with a model that includes a single source node, whose output is connected to a LinearPredictorNode, whose output is connected to a sink node. The LinearPredictorNode is too complex to emit itself as LLVM-IR, but it can refine itself using a combination of a DotProductNode, a SumNode, and two ConstantNodes. A DotProductNode can either emit itself as LLVM-IR or refine itself using multiple nodes that implement scalar multiplication and another SumNode, depending on the size of the predictor and the characteristics of the target platform. The result is executable code that is optimized for each target platform. This includes the ability to take advantage of platform specific instructions like AVX, SSE, NEON, etc.