Cumo (pronounced "koomo") is a CUDA-aware, GPU-optimized numerical library that offers a significant performance boost over Ruby Numo, while (mostly) maintaining drop-in compatibility.
- Ruby 2.5 or later
- NVIDIA GPU Compute Capability 3.5 (Kepler) or later
- CUDA 9.0 or later
Install CUDA and set your environment variables as follows:
export CUDA_PATH="/usr/local/cuda"
export CPATH="$CUDA_PATH/include:$CPATH"
export LD_LIBRARY_PATH="$CUDA_PATH/lib64:$CUDA_PATH/lib:$LD_LIBRARY_PATH"
export PATH="$CUDA_PATH/bin:$PATH"
export LIBRARY_PATH="$CUDA_PATH/lib64:$CUDA_PATH/lib:$LIBRARY_PATH"
To use cuDNN features, install cuDNN and set your environment variables as follows:
export CUDNN_ROOT_DIR=/path/to/cudnn
export CPATH=$CUDNN_ROOT_DIR/include:$CPATH
export LD_LIBRARY_PATH=$CUDNN_ROOT_DIR/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=$CUDNN_ROOT_DIR/lib64:$LIBRARY_PATH
FYI: I use cudnnenv to install cudnn under my home directory like export CUDNN_ROOT_DIR=/home/sonots/.cudnn/active/cuda
.
Add the following line to your Gemfile:
gem 'cumo'
And then execute:
$ bundle
Or install it yourself as:
$ gem install cumo
An example:
[1] pry(main)> require "cumo/narray"
=> true
[2] pry(main)> a = Cumo::DFloat.new(3,5).seq
=> Cumo::DFloat#shape=[3,5]
[[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]]
[3] pry(main)> a.shape
=> [3, 5]
[4] pry(main)> a.ndim
=> 2
[5] pry(main)> a.class
=> Cumo::DFloat
[6] pry(main)> a.size
=> 15
The following find-and-replace should just work:
find . -type f | xargs sed -i -e 's/Numo/Cumo/g' -e 's/numo/cumo/g'
If you want to dynamically switch between Numo and Cumo, something like the following will work:
if gpu
require 'cumo/narray'
xm = Cumo
else
require 'numo/narray'
xm = Numo
end
a = xm::DFloat.new(3,5).seq
The following methods behave incompatibly with Numo by default for performance reasons:
extract
[]
count_true
count_false
Numo returns a Ruby numeric object for 0-dimensional NArray, while Cumo returns the 0-dimensional NArray instead of a Ruby numeric object. Cumo differs in this way to avoid synchronization and minimize CPU ⇄ GPU data transfer.
Set the CUMO_COMPATIBLE_MODE
environment variable to ON
to force Numo NArray compatibility (for worse performance).
You may enable or disable compatible_mode
as:
require 'cumo'
Cumo.enable_compatible_mode # enable
Cumo.compatible_mode_enabled? #=> true
Cumo.disable_compatible_mode # disable
Cumo.compatible_mode_enabled? #=> false
You can also use the following methods which behave like Numo's NArray methods. The behavior of these methods does not depend on compatible_mode
.
extract_cpu
aref_cpu(*idx)
count_true_cpu
count_false_cpu
Set the CUDA_VISIBLE_DEVICES=id
environment variable, or
require 'cumo'
Cumo::CUDA::Runtime.cudaSetDevice(id)
where id
is an integer.
GPU memory pool is enabled by default. To disable it, set CUMO_MEMORY_POOL=OFF
, or:
require 'cumo'
Cumo::CUDA::MemoryPool.disable
See https://github.com/ruby-numo/numo-narray#documentation, replacing Numo with Cumo.
This project is under active development. See issues for future works.
Install ruby dependencies:
bundle install --path vendor/bundle
Compile:
bundle exec rake compile
Run tests:
bundle exec rake test
Generate docs:
bundle exec rake docs
ccache would be useful to speedup compilation time. Install ccache and configure with:
export PATH="$HOME/opt/ccache/bin:$PATH"
ln -sf "$HOME/opt/ccache/bin/ccache" "$HOME/opt/ccache/bin/gcc"
ln -sf "$HOME/opt/ccache/bin/ccache" "$HOME/opt/ccache/bin/g++"
ln -sf "$HOME/opt/ccache/bin/ccache" "$HOME/opt/ccache/bin/nvcc"
Set MAKEFLAGS
to specify make
command options. You can build in parallel as:
bundle exec env MAKEFLAG=-j8 rake compile
bundle exec env CUMO_NVCC_GENERATE_CODE=arch=compute_60,code=sm_60 rake compile
This is useful even on development because it makes it possible to skip JIT compilation of PTX to cubin during runtime.
Compile with debugging enabled:
bundle exec DEBUG=1 rake compile
Run tests with gdb:
bundle exec gdb -x run.gdb --args ruby test/narray_test.rb
You may put a breakpoint by calling cumo_debug_breakpoint()
at C source codes.
--location
option is available as:
bundle exec ruby test/narray_test.rb --location 121
DTYPE
environment variable is available as:
bundle exec DTYPE=dfloat rake compile
bundle exec DTYPE=dfloat ruby test/narray_test.rb
bundle exec CUDA_LAUNCH_BLOCKING=1
Cumo shows warnings if CPU and GPU synchronization occurs if:
export CUMO_SHOW_WARNING=ON
By default, Cumo shows warnings that occurred at the same place only once. To show all, multiple warnings, set:
export CUMO_SHOW_WARNING=ON
export CUMO_SHOW_WARNING_ONCE=OFF
Bug reports and pull requests are welcome on GitHub at https://github.com/sonots/cumo.
- Fast Numerical Computing and Deep Learning in Ruby with Cumo - Presentation Slide at RubyKaigi 2018