Skip to content

Latest commit

 

History

History
73 lines (38 loc) · 5.89 KB

README.md

File metadata and controls

73 lines (38 loc) · 5.89 KB

Robot Grasping - Generalising to Novel Objects

Repository to track the progress in robot grasping, including the datasets and the current state-of-the-art.

Methods

Detecting Grasp Poses

Summary: Given two or more images, this algorithm tries to find a few points which indicate good grasping locations. These points are then triangulated to compute a 3D grasping position. It is a super-vised learning method, trained on synthetic data. Effectively grasps wide range of (unseen) objects.

Summary: Introduces 5-dimensional grasp representation. Presents two-step cascaded system. First network has fewer features and can effectively prune unlikely grasps. Second network only handles those few good grasps. The input is a single RGB-D image. A small network is used to evaluate potential grasps. The best grasps are inputs for the second larger network, that outputs the best grasp. This is then converted to a robot grasp that includes a grasping point and an approach vector. It uses the rectangle's parameters and the surface normal at the rectangle's center to compute this. The network is trained on the Cornell Dataset, which is particulary set up for parellel gripper robots.

Summary: Presents single-stage regression to grasp bounding boxes, not using sliding-window methods. Runs in 13fps on GPU. Can also predict multiple grasps; works better, especially with objects that can be grasped in multiple ways. Also uses 5D representation. Standard ConvNetwork that outputs 6 neurons, trained on Cornell Dataset, pretrained on ImageNet. Best so far: 88 procent accuracy.

Summary: Implements ResNet50. Cornell Dataset; pretrained on ImageNet; 5D pose. Best so far: 89.1 procent accuracy. Does not test with real robot.

Summary: Predicts multiple-grasp poses. Network has two parts: feature extractor (DNN) & multi-grap predictor (regresses grasp rectangles from oriented anchor boxes; classifies these to graspable or not). Cornell Dataset. Best so far: 97.74 procent accuracy. Does not test with real robot.

Future work: Detect grasp locations for all objects in an image. Handle overlapping objects.

Summary: Proposes a rotation ensemble module (REM): convolutions that rotates network weights. 5D poses; Cornell dataset: 99.2 procent accuracy. Test on real (4-axis) robot: 93.8 succes rate (on 8 small objects).

Surveys

Summary: Talks about object localization; object segmenation; 6D-pose estimation; grasp detection; end2end; motion planning; datasets.

Deep Reinforcement Learning

Each image pixel corresponds to a movement (either push or grasping) executed on the 3D location of that pixel in the scene.

Input (to a FCN) is a single image. Predict dense pixel-wise predictions of future expected reward: fully convolutional action-value functions.

Each state St is an RGB-D heightmap image representation at time step t. "Each individual FCN φψ takes as input the heightmap image representation of the state st and outputs a dense pixel-wise map of Q values with the same image size and resolution as that of st, where each individual Q value prediction at a pixel p represents the future expected reward of executing primitive ψ at 3D location q where q 􏰏 p ∈ st. Note that this formulation is a direct amalgamation of Q-learning with visual affordance-based manipulation."

Future Work: 1-Not using heightmap, but another respresentation. 2-Train on larger variety of shapes. 3-Add more motions/manipulation.

Surveys

Summary: Reviews Deep RL methods in a realistic SIMULATED environment. Off-policy Q-learning; Regression with Monte-Carlo; Corrected Monte-Carlo; Deep Deterministic Policy Gradient; Push Consistency Learning. DQL performs best in low-data regimes. Monte-Carlo performs a bit better in high-data.

Future Work: 1: Focus on combining best of bootstrapping and multistep return. 2: Evaluate similar methods on real robots.

Other

Summary: Pixel-wise probability predictions for four different grasping primitives. Manually annoted dataset, pixels get 0, 1 or neither.

Summary: Kit assembling. Not really training data so: time-reverse disassembly. Learn policies for robotic assembly that can generalize to new objects.