Skip to content

HKUST-NISL/GEDDnet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GEDDnet: A Network for Gaze Estimation with Dilation and Decomposition

Architecture

Dilated Convolution

We use dilated-convolutions to capture high-level features at high-resolution from eye images. We replace some regular convolutional layers and max-pooling layers of a VGG16 network by dilated-convolutional layers with different dilation rates.

Gaze Decomposition

We propose gaze decomposition for appearance-based gaze estimation, which decomposes the gaze estimate into the sum of a subject-independent term estimated from the input image by a deep convolutional network and a subject-dependent bias term.

During training, both the weights of the deep network and the bias terms are estimated. During testing, if no calibration data is available, we can set the bias term to zero. Otherwise, the bias term can be estimated from images of the subject gazing at different gaze targets. The proposed gaze decompostion method enables low complexity calibraiton, i.e., using calibration data collected when subjects view only one or a few gaze targets and the number of images per gaze target is small.

Setup

1. Prerequisites

Tensorflow == 1.15

python == 3.7

opencv

2. Datasets

Preprocess the dataset so that it contains:

(1) A 120$\times$120 face image: face_img

(2) Two 80$\times$120 eye images: left_eye_img and right_eye_img

(3) Pitch and yaw gaze angles in radian: eye_angle. Remember pitch first!!

(4) An integer to index each subject: subject_index. When the images of a subject are flipped horizontally, the index changes, i.e., subj_index+total_num_subject

In dataset['face_img'] in train.py, the shape of the mat should be $N \times 120 \times 120$. The shape of dataset['eye_img'] should be $N \times 80 \times 120$. The shape of dataset['eye_angle'] should be $N \times 2$. The shape of dataset['subject_index'] should be $N \times 1$.

3. Online Data Augmentation

During training, PreProcess.py will perform online data augmentatioin, including random horizontal flipping, rotate and cropping. The face_img will be cropped from 120$\times$120 to 96$\times$96; the eye_img will be cropped from 80$\times$120 to 64$\times$96; The subject_index will changes to subject_index + total_num_subject if the image is flipped horizontally.

4. Training and Testing

For training, just simplily run:

cd code
python train.py --num_subject *total_num_subject_ignoring_horizontal_flipping*

For inference, run

cd code
python infer.py

Note that a trained model data/models and an example of camera matrix data/camera_matrix.mat are provided.

Bibtex

@article{chen2022towards,
 title={Towards High Performance Low Complexity Calibration in Appearance Based Gaze Estimation}, 
 author={Chen, Zhaokang and Shi, Bertram},
 journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
 year={2022},
 volume={},
 number={},
 pages={1-1},
 publisher={IEEE},
 doi={10.1109/TPAMI.2022.3148386}}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages