This work is based on the project visual-pushing-grasping control UR5 robot in CoppeliaSim(V-REP)
- I do some major changes focus on reducing computation complexity by using lightweight network and a different way of modeling action space, reward.
- Update test script and pretrained weight
- Test result video
- Only for grasping action
- Using mobilenetv2 as backbone and 2 prediction head (1 for 16 orientation, 1 for 112x112 location of grasping action)
- Update evalution script
- Using ROS replace V-REP python api
- Increase location map to 224x224 to improve precision
- Add more 1 prediction head for pushing/grasping
- End-to-end pipeline, single branch, replace densenet121 with mobilenetv2
- No input rotation, modeling action space as a 3D tensor 112x112x16 (height map resolution=4mm, 8 angle rotations)
- Only use RGB as input of network, depth information for z position
- The simulation scene when training
train_twoheadgraspnet.py
, robot successfuly learn to find the object and do grasping action. Due to the limit of resolution (4mm instead of 2mm in original work), location prediction is sometime inaccurate. There is no pushing action so robot find difficult to handle complex scenerios. The scene is recorded during training phase so there are random actions in sequence of actions.
- CoppeliaSim v4.5.1 linux
- Pytorch
- Open simulation/simulation.ttt in CoppeliaSim
- Run
python train_twoheadgraspnet.py
- This repository is under experimenting and developing period
- Need to do more expreriment with one head model
- [1] https://github.com/andyzeng/visual-pushing-grasping
- [2] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).