Skip to content
Ishani Kathuria edited this page Nov 5, 2021 · 2 revisions

Gesture recognition Model with depth estimation

3D-CNN gesture recognition model

Preprocessing

  • Unify frames to be 30 in each folder.

Since all the gestures in the dataset used contained a different number of frames in each folder, to maintain uniformity each folder was processed to have 30 frames each. If a folder had more frames, some were deleted and if the frames were less then the last frame was multiplied to reach a total of 30.

  • Resize the frames to 64x64 for input.

The frames were resized to an optimal size of 64x64 pixels for the input to the network to reduce processing time and memory usage.

  • Convert them to grayscale.

The frames were converted to grayscale so that the model was not biased toward the skin colors available in the dataset.

  • Normalization

The frames were normalized using StandardScaler for input.

Model architecture

Model Summary

Training

The model was trained on NVIDIA 1650ti 4GB GPU for 10 epochs with a batch size of 32.
Number of training images were : 1000 folders with 30 images for 6 gestures = 1000*30*6 = 180000
Number of validation images were : 124 folders with 30 images for 6 gestures = 124*30*6 = 22320

Open3D Depth Estimation

  • Install CUDA 11.0

Visit https://docs.nvidia.com/cuda/archive/11.0/cuda-installation-guide-microsoft-windows/index.html and install cuda 11.0 and required dependencies

  • Install Pytorch

Install Pytorch using

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

  • Download the requirements.

Install dependencies using pip install -r requirements.txt

Visit the https://github.com/hassony2/manopth and install the mano using python setup.py install.

Visit https://mano.is.tue.mpg.de/ sign in and Download Models and Code (the downloaded file should have the format mano_v*_*.zip).

Note that all code and data from this download falls under the MANO license.

Unzip and copy the model's folder into the Merged module folder.

Your structure should look like this.
Merged module/ mano/ models/ webuser/

Download the checkpoints.

  • Usage.

For real-time demo run python visualization.py.

  • Processing.

The model detects the key points through the given input. It uses open3d to connect through the key points. It makes a mesh with a width of 1mm through the connected key points. This forms a 3d model which highlights the key points.
Hand gesture with 2 fingers
Hand gesture with 5 fingers