Gesture Recognition Using Neural Networks with Google's Project Soli Sensor
Dataset and trained model are now available.
This is the open source evaluation code base of our paper:
Interacting with Soli: Exploring Fine-Grained Dynamic Gesture Recognition
in the Radio-Frequency Spectrum
Saiwen Wang, Jie Song, Jamie Lien, Poupyrev Ivan, Otmar Hilliges
(link to the paper)
Please cite the paper if you find the code useful. Thank you. (BibTex)
This project uses Google's Project Soli sensor.
Soli is a new sensing technology that uses miniature radar to detect touchless gesture interactions.
Soli sensor technology works by emitting electromagnetic waves in a broad beam. Objects within the beam scatter this energy, reflecting some portion back towards the radar antenna. Properties of the reflected signal, such as energy, time delay, and frequency shift capture rich information about the object’s characteristics and dynamics, including size, shape, orientation, material, distance, and velocity.
Our paper uses a light-weight end-to-end trained Convolutional Neural Networks and Recurrent Neural Networks architecture, recognizes 11 in-air gestures with 87% per-frame accuracy, and can perform realtime predictions at 140Hz on commodity hardware. (link to the paper video)
- Python 2: HDF5, OpenCV 2 interfaces for python.
- C++: HDF5, OpenCV 2, Boost
- Lua JIT and Torch 7.
- Torch 7 packages:
class
, GPU supportcunn
andcutorch
, Matlab supportmattorch
, JSON supportlunajson
, Torch image libraryimage
- Please note that
mattorch
is an outdated packages which is no longer maintained.
- Preprocessing (HDF5 to images):
python pre/main.py --op image --file [dataset folder]
--target [target image folder] --channel 4 --originsize 32 --outsize 32
- Preprocessing (generate mean file):
python pre/main.py --op mean --file [image folder]
--target [mean file name] --channel 4 --outsize 32
- Load model and evaluate:
th net/main.lua --file [image folder] --list [train/test sequence split file]
--load [model file] --inputsize 32 --inputch 4 --label 13 --datasize 32
--datach 4 --batch 16 --maxseq 40 --cuda --cudnn
- Download dataset (please let me know when the link doesn't work).
- Train/test split file (in JSON format) we used is stored in the repo
config/file_half.json
. - The dataset contains multiple preprocessed Range-Doppler Image sequences.
Each sequence is saved as a single HDF5 format data file. File names are
defined as
[gesture ID]_[session ID]_[instance ID].h5
. Range-Doppler Image data of a specific channel can be accessed by dataset namech[channel ID]
. Label can be accessed by dataset namelabel
. Range-Doppler Image data array has shape of[number of frame] * 1024
(can be reshape back to 2D Range-Doppler Image to32 * 32
) - Simple Python code to access the data:
# Demo code to extract data in python
import h5py
use_channel = 0
with h5py.File(file_name, 'r') as f:
# Data and label are numpy arrays
data = f['ch{}'.format(use_channel)][()]
label = f['label'][()]
- Dataset session arrangement for evaluation.
- 11 (gestures) * 25 (instances) * 10 (users) for cross user evaluation: session 2 (25), 3 (25), 5 (25), 6 (25), 8 (25), 9 (25), 10 (25), 11 (25), 12 (25), 13 (25).
- 11 (gestures) * (50 (instances) * 4 (sessions) + 25 (instances) * 2 (sessions)) for single user cross session evaluation: session 0 (50), 1 (50), 4 (50), 7 (50), 13 (25), 14 (25).
- Please refer to the paper for the gesture collecting campaign details.
- The gestures are listed in the table below. Each column represents one gesture and we snapshot three important steps for each gestures. The gesture label is indicated by the number in the circle above. Please notice that the gesture label order is different than the paper, as we regroup gestures in the paper. Sequences with gesture ID 11 are background signals with no presence of hand.
- Download model
- Trained proposed model, please refer to the paper for model detail.
- Simple Lua (Torch 7) code to load the model:
require 'cudnn'
require 'rnn'
loadFile = 'uni_image_np_50.t7'
net = torch.load(loadFile)
print(net)
- The model uses layers support
cudnn
.
This is a simplified version of the original code base we used for all the
experiments in the paper. The complex Torch based class hierarchies in
the net
reflects varies model architectures we tried during the
experiments. For simplicity, we only make the evaluation part public.
The model detail can be found both in the paper and the model file.
This project is licensed under the terms of the MIT license.