This repository contains some basic examples to familiarize with the PULP-TrainLib framework and the topic of On-Device Learning. This training library is written in C and optimized for RISC-V Parallel Ultra-Low Power Processors, i.e., Microcontrollers (MCU). We will use the open-source PULP platform as the target device, leveraging the available platform simulator (GVSOC) that is included in the PULP-SDK.
The tutorial has been tested on a Ubuntu 20.04 LTS machine (we used a Windows WSL with Ubuntu 20.04 LTS).
On a fresh machine, updating the package list may be required:
sudo apt-get update
The following packages need to be installed:
sudo apt-get install -y make python-is-python3 build-essential git libftdi-dev libftdi1 doxygen python3-pip libsdl2-dev curl cmake libusb-1.0-0-dev scons gtkwave libsndfile1-dev rsync autoconf automake texinfo libtool pkg-config libsdl2-ttf-dev
We also recommend using Anaconda or Miniconda to create a conda environment for Python (3.8), e.g.:
conda create --name trainlib-tutorial python=3.8
conda activate trainlib-tutorial
Required Python packages (for GVSOC):
pip install --user argcomplete pyelftools
PULP-TrainLib uses Pytorch for generating test vectors and checking the results:
python -m pip install argparse six
python -m pip install install torch torchvision torchaudio
python -m pip install torchsummary
GCC <= 9.4 is required. To check if gcc has the right version:
gcc --version
Please, refer to the official guide to update gcc if is needed.
To install the PULP-SDK and the PULP-TrainLib library, you can run this script:
source install_ub20.sh
The script will clone the PULP-SDK and the PULP-TrainLib submodules and the RISCV compiler.
Once the installation is completed, do not forget to close the installation terminal and open a new one.
IMPORTANT: Every time a new terminal is open, run the source script from the top directory of the repository:
source setup.sh
Optionally, you may also need to re-activate your conda environment that was created during the installation of the requirements:
conda activate trainlib-tutorial
To check if your installation procedure was successful you can try to run a helloworld test:
cd pulp-sdk/tests/hello/
make clean all run
where:
clean
: remove the build folderall
: compile the coderun
: execute the binary on the GVSOC simulator
In the case the execution is successful, the following string will appear in the terminal:
Hello from FC
In case of any issue, you can refer to the PULP SDK or open an issue in this repository.
You can refer to instructions inside every Ex folder to run the provided examples. These examples will be shown and explained during the tutorial at DATE24: ET02 On-Device Continual Learning Meets Ultra-Low Power Processing.
Despite it is not mandatory, we suggest to install and use VSCode to navigate the tutorial repository and run the code on the terminal, e.g., VScode on Ubuntu WSL.
PULP-TrainLib is the first On-Device Learning optimized for RISC-V Multi-Core MCUs, tailored for the Parallel Ultra-Low Power (PULP) Platform.
On-Device Learning is a novel paradigm for enabling Deep Neural Network (DNN) Training on extreme-edge devices. This paradigm:
- ensures privacy by not sharing personal data with third-party compute resources and, at the same time, reduces network traffic and potential congestion.
- reduces the latency for model updates with respect to waiting for a server response
The PULP Platform is a fully open-source (hardware and software) scalable platform for extreme-edge computing based on RISC-V cores.
An example of a PULP-based System-on-Chip (SoC) is shown in this figure:
In this embodiment, the Cluster features 8 parallel cores (Cores 0 to 7) for the computation, and a Cluster Controller core (Core 8), which acts as the cluster controller.
PULP-TrainLib makes efficient use of the available resources of the PULP-based SoCs, which feature:
- A single Core (Fabric Controller) for the control of the system
- A Cluster of N RISC-V Cores capable of computing parallel tasks
- A hierarchical memory system, featuring a Cluster-reserved fast L1 memory and a system-level L2 memory
- A Cluster DMA to access the L2 memory from the Cluster in few cycles
- Tightly coupled accelerators, as a Mixed-Precision FPU, available for the Cluster
The platform used in this tutorial features the following specs:
- 8 Cluster Cores
- 8 Mixed-Precision FPUs (FP32, FP16), one per Cluster Core
- 1.5 MB of L2 memory
- 256 kB of L1 memory
PULP-TrainLib is available as open source here. Yo can refer to the README.md for more details.
In short, PULP-TrainLib is organized as follows:
lib/
include/
sources/
tests/
test_<layer/function>_<possible_options>
tools/
AutoTuner/
TrainLib_Deployer/
PULP-TrainLib is written in C code, with specific calls to the PULP PMSIS libraries for parallel execution.
To include PULP-TrainLib in your project, you have to #include "pulp_train.h"
in your application code.
The project is open-source. In the case you want to contribute, open a pull request on the official repository https://github.com/pulp-platform/pulp-trainlib, or contact the maintainers. We are willing to collaborate on the project!
D. Nadalini, M. Rusci, G. Tagliavini, L. Ravaglia, L. Benini, and F. Conti, "PULP-TrainLib: Enabling On-Device Training for RISC-V Multi-Core MCUs through Performance-Driven Autotuning" SAMOS Pre-Print Version, Springer Published Version
D. Nadalini, M. Rusci, L. Benini, and F. Conti, "Reduced Precision Floating-Point Optimization for Deep Neural Network On-Device Learning on MicroControllers" ArXiv Pre-Print