This tool provides client and server side to distribute the execution of an OpenCL benchmark to execution devices available in the network. Mapping considers available machines and corresponding execution devices (CPU or GPU) on those machines. This tool is mainly developed as a demo tool for the CHIST-ERA Projects Seminar 2017 in Brussels.
- The client provides an interface to select an OpenCL benchmark and a corresponding dataset. (GUI and Console)
- It connects to servers on available execution machines on the network.
- It makes the decision on how to distribute the workload to connected machines.
- It receives execution statistics and evaluates and presents them.
- The server receives benchmark and dataset specifications to be executed from a connected client.
- Benchmark binaries and data files need to be available on the server, they are not transmitted.
- It executes the workload.
- It measures execution statistics and returns them.
In this version the available benchmarks are restricted to Rodinia Kmeans and Lud benchmarks. Different data set sizes are specified for each benchmark. Additional benchmarks and dataset sizes can be configured using csv configuration files (see Configuring Benchmarks).
The mapper to be used can be selected on client side with the -m
or -mapperType
argument.
aps the selected benchmark to the default device on the first available server.
Maps the entire selected benchmark to the default device on each available server.
Maps the selected benchmark to each available server using an execution device predicted as best fit in terms of energy consumption and performance.
Code features of the benchmark's OpenCL kernel(s) are passed to a model which returns an optimal GPU or CPU mapping for the kernel. The model is trained for a particular server architecture which can be specified when starting the server. Upon connection with the client, the server transfers an architecture id which allows the client to pick a corresponding workload to execution device prediction model.
For performance reasons predictions are made ahead of execution time and saved in a configuration file which is used at runtime by the client.
The file with ahead of time predictions can be specified using the --offlinePredictions
command line argument upon client startup.
$ java -jar ./build/libs/oclBenchMapper-x.x.x.jar client
Starts the client with default configuration trying to connect to a single server at localhost:9090
, using the
FCFS
workload mapper and a console based user interface.
Multiple server addresses can be configured using the -a
attribute. For example:
$ java -jar ./build/libs/oclBenchMapper-x.x.x.jar client -a 192.168.0.1:9090 192.168.173.4:9091
The client can be executed using a graphical user interface (see Client GUI). For additional help, execute:
$ java -jar ./build/libs/oclBenchMapper-x.x.x.jar client -h
$ java -jar ./build/libs/oclBenchMapper-x.x.x.jar server -A dividend
Starts the server with default benchmark configurations (see Configuring Benchmarks for information on custom benchmark configurations).
It is listening on default port 9090 for client connections which can be specified otherwise using the -p
argument.
-A dividend
specifies the server architecture and the given id is passed on to the client upon connection.
Server simulation is available using the --simulation
command and specifying a file with simulation data (see Non Interactive Client on how to generate simulation data for a specific server architecture).
To see additional help, execute:
$ java -jar ./build/libs/oclBenchMapper-x.x.x.jar server -h
ATTENTION: All configured servers need to be running and accessible before a client is started
The client provides a graphical user interface which can be started using the --gui
flag.
The displayed GUI has control elements in the upper left, a text message output in the upper center, an output for statistic totals in the upooer right and two graphs displaying energy consumption (left) and execution time (right) of executed OpenCL kernels.
- Two combo boxes allow the selection of a benchmark and a corresponding workload.
- Pressing the
Run
button executes the selected benchmark with the given workload on the available servers using the specified workload mapper. - Pressing the
Start Automatic
button initiates an automatic mode. In this mode the client automatically executes the selected benchmark over and over on the available servers. The dataset size is selected randomly for each execution. - Execution results of all servers are combined and displayed in the two charts.
- The
mapper
series shows execution results using the device mapping made by the client. - Additionally, a graph is displayed for each possible combination of CPU or GPU depending on available servers. This allows comparing mapper results to alternative combinations. For example, the mapper executes a workload on the GPU of server 1 and the CPU of server 2. Alternative mappings for all combinations of server 1 and 2 as well as GPU or CPU are displayed for comparisson.
- A checkbox is provided to enable or disable the actual execution of alternative mappings. If enabled, alternative mappings are executed together with the mappers choice for each benchmark execution. If disabled, execution statistics of alternative mappings are taken from a precomputation file and only the mappers choice is executed on the server. By default this tick box is disabled to allow a fluent demo.
- Each chart shows energy and performance results normalised to the best execution device (CPU or GPU) for the executed benchmark and workload combination.
The client can be executed in non-interactive-client
mode.
- In this mode no user interface is provided. The client executes a set of commands provided via input file.
- The client can connect to only a single server in this mode.
- Execution statistics are saved in precomputation file format which is used in the gui to display alternative mappings.
- Also, a comprehensive set of execution statistics is saved as binary file and can be used for server simulation. By executing a server with the flag
--simulation src/main/resources/serverConf/dividend_simulation_stats.dat
- Additionally an optimal execution device to benchmark mapping is generated and saved.
New benchmarks can be configured to be used by the system:
- Add a name for the new benchmark to the benchmark configuration file. This file needs to be available to both client and server.
- Add necessary execution commands to the benchmark execution configuration. The
BinDir
column needs the location of the benchmarks directory relative to the server's working directory. TheExecCmd
column needs the actual command to execute the benchmark minus the name of the data file. - Add data set sizes to the benchmark data configuration. The
DataArgs
column expects a data file location relative to the corresponding benchmark's working directory. This file also needs to be available to both client and server. - To properly use the client's GUI, execution statistics and execution device predictions need to be generated and added to the corresponding configurations (see Non Interactive Client).
- java 8
- AMD power profiling API implemented for OpenCL on server side
This project uses gradle as build tool. A gradle wrapper is provided so gradle itself does not need to be installed on the build machine.
$ ./gradlew build
The initial run will download necessary dependencies and can be slow. Future executions will be much faster.
The generated jar file can be found in ./build/libs/oclBenchMapper-x.x.x.jar
To generate project files for the Eclipse IDE execute:
$ ./gradlew eclipse
To list further build tasks for gradle execute:
$ ./gradlew tasks