Cricket consists of two parts: A virtualization layer for CUDA applications that allows the isolation of CPU and CPU parts by using Remote Procedure Calls and a checkpoint/restart tool for GPU kernels.
Cricket requires
- CUDA Toolkit (E.g. CUDA 11.1)
rpcbind
libcrypto
libgdb
(only for in-kernel C/R)libtirpc
Libgdb and libtirpc built as part of the main Makefile.
On the system where the Cricket server should be executed, the appropriate NVIDIA drivers should be installed.
git clone https://github.com/RWTH-ACS/cricket.git
cd cricket && git submodule update --init
LOG=INFO make
Environment variables for Makefile:
LOG
: Log level. Can be one ofDEBUG
,INFO
,WARNING
,ERROR
.WITH_IB
: If set toYES
build with Infiniband support.WITH_DEBUG
: Use gcc debug flags for compilation
By default Cricket uses TCP/IP as a transport for the Remote Procedure Calls. This enables both remote execution, where server and client execute on different systems and local execution, where server and client execute on the same system.
To support Cricket, the CUDA libraries must be linked dynamically to the CUDA application. For the runtime library, this can be done using the '-cudart shared' flag of nvcc
.
The Cricket library has to be preloaded to the CUDA Application. For starting the server:
LD_PRELOAD=<path-to-cricket>/bin/cricket-server.so <cuda-binary>
The client can be started like this:
REMOTE_GPU_ADDRESS=<address-of-server> LD_PRELOAD=<path-to-cricket>/bin/cricket-client.so <cuda-binary>
LD_PRELOAD=/opt/cricket/bin/cricket-server.so /opt/cricket/tests/test_kernel
REMOTE_GPU_ADDRESS=127.0.0.1 LD_PRELOAD=/opt/cricket/bin/cricket-client.so /opt/cricket/tests/test_kernel
Compile the application
cd /nfs_share/cuda/samples/5_Simulations/nbody
make NVCCFLAGS="-m64 -cudart shared" GENCODE_FLAGS="-arch=sm_61"
Start the Cricket server
LD_PRELOAD=/nfs_share/cricket/bin/cricket-server.so /nfs_share/cuda/samples/5_Simulations/nbody/nbody
Run the application
REMOTE_GPU_ADDRESS=remoteSystem.my-domain.com LD_PRELOAD=/nfs_share/cricket/bin/cricket-client.so /nfs_share/cuda/samples/5_Simulations/nbody/nbody -benchmark
- cpu: The virtualization layer
- gpu: The checkpoint/restart tool
- submodules: Submodules are located here.
- cuda-gdb: modified GDB for use with CUDA. We mostly need the modified libbfd for gathering information from the CUDA ELF.
- libtirpc: Transport Indepentend Remote Procedure Calls is requried for the virtualization layer-
- tests: some synthetic CUDA applications to test cricket.
- utils: A Dockerfile for repoducibility and for our CI.
set cindent
set tabstop=4
set shiftwidth=4
set expandtab
set cinoptions=(0,:0,l1,t0,L3
match ErrorMsg /\s\+$\| \+\ze\t/
This project adheres to the Linux Kernel Coding Style, except when it doesn't.
Etymology: Cricket is an abbreviation for Checkpoint Restart In Cuda KErnels Tool