Skip to content

Latest commit

 

History

History
41 lines (36 loc) · 3.27 KB

README.md

File metadata and controls

41 lines (36 loc) · 3.27 KB

Mask RCNN for Human Pose Estimation

The original code is from "https://github.com/matterport/Mask_RCNN" on Python 3, Keras, and TensorFlow. The code reproduce the work of "https://arxiv.org/abs/1703.06870" for human pose estimation. This project aims to addressing the issue#2. When I start it, I refer to another project by @RodrigoGantier .

However RodrigoGantier's project has the following problems:

  • It's codes have few comments and still use the original names from @Matterport's project, which make the project hard to understand.
  • When I trained this model, I found it's hard to converge as described in issue#3.

Requirements

  • Python 3.5+
  • TensorFlow 1.4+
  • Keras 2.0.8+
  • Jupyter Notebook
  • Numpy, skimage, scipy, Pillow, cython, h5py

Getting Started

Discussion

  • I convert the joint coordinates into an integer label ([0, 56*56)), and use tf.nn.sparse_softmax_cross_entropy_with_logits as the loss function. This refers to the original Detectron code which is key reason why my loss can converge quickly.
  • If you still want to use the keypoint mask as output, you'd better adopt the modified loss function proposed by @QtSignalProcessing in issue#2. Because after crop and resize, the keypoint masks may hava more than one 1 values, and this will make the original soft_cross entropy_loss hard to converge.
  • Althougth the loss converge quickly, the prediction results isn't as good as the oringal papers, especially for right or left shoulder, right or left knee, etc. I'm confused with it, so I release the code and any contribution or suggestion to this repository is welcome.