-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathscene6.html
214 lines (193 loc) · 9.36 KB
/
scene6.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
<!DOCTYPE html>
<html>
<head lang="en">
<meta charset="UTF-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<title>Physically Grounded VLMs</title>
<meta name="description" content="Physically Grounded VLMs">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- <base href="/"> -->
<!-- <link rel="apple-touch-icon" href="apple-touch-icon.png"> -->
<!-- <link rel="icon" type="image/png" href="img/seal_icon.png"> -->
<!-- Place favicon.ico in the root directory -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/codemirror/5.8.0/codemirror.min.css">
<link rel="stylesheet" href="css/app.css">
<link rel="stylesheet" href="css/bootstrap.min.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/5.8.0/codemirror.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/1.5.3/clipboard.min.js"></script>
<script src="js/app.js"></script>
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-52J0PM8XKV"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-52J0PM8XKV');
</script>
<style>
.nav-pills {
position: relative;
display: inline;
}
.imtip {
position: absolute;
top: 0;
left: 0;
}
</style>
</head>
<body>
<div class="topnav">
<a href="index.html">Home</a>
<div class="rightnav">
<a href="scene1.html">Scene 1: Countertop</a>
<a href="scene2.html">Scene 2: Art Table</a>
<a href="scene3.html">Scene 3: Floor</a>
<a href="scene4.html">Scene 4: Kitchen A</a>
<a href="scene5.html">Scene 5: Kitchen B</a>
<a href="scene6.html" class="active">Scene 6: Salad Bar</a>
<a href="scene7.html">Scene 7: Living Room</a>
<a href="scene8.html">Scene 8: Shelf</a>
<a href="RS1/index.html">Robot Scene 1</a>
<a href="RS2/index.html">Robot Scene 2</a>
</div>
</div>
<div class="container" id="main">
<div class="row">
<div class="col-md-8 col-md-offset-2">
<br>
<h1>
Scene 6: Salad Bar
</h1>
<div class="irow">
<div class="strict_column">
<img src="img/eval2_scene6.png">
</div>
<div class="strict_column">
<p> A: envelope [sign on napkin dispenser] </p>
<p> B: humidifier [napkin dispenser] </p>
<p> C: ladle [metal tongs] </p>
<p> D: food [two salad containers on the right] </p>
<p style="color: #9900ff"> E: bottle [red wine vinegar bottle] </p>
<p> F: frying pan [closer salad tray] </p>
<p> G: paper [napkin coming out of dispenser] </p>
<p> H: countertop </p>
<p> I: bottle [olive oil bottle] </p>
<p> J: bottle [black container on the right] </p>
<p> K: bottle [black container on the left] </p>
<p> L: juice [olive oil inside bottle] </p>
<p> M: cabinetry </p>
<p> N: countertop [more cropped in view of countertop] </p>
<p> O: bowl [paper plate under the counter] </p>
</div>
</div>
<br>
<p class="text-justify">
Here we provide a scene from our real scene planning evaluation, and one example task for it. We provide the object detections and labels from OWL-ViT, which is what our planner has access to, but we provide more precise labels in square brackets. We provide color-coded bounding boxes for the object detections that are relevant for the example task. For each version of the LLM planner, we provide its output, including possible interaction with a VLM, and the generated plan. Outputs generated by the VLMs are in <span style="color: #ff9900">orange</span>, and we mark correct and incorrect plans with <span style="color: #38761d">green</span> and <span style="color: #ff0000">red</span> respectively.
</p>
<h2><b>Task 38: Bring me an empty container that you can confidently use to contain liquids, if one exists. Otherwise, tell the user that no suitable containers exist.</b></h2>
<p class="text-justify">
In this task, we want to determine if the VLM understands whether containers are empty and can contain liquids. The bottles on the counter (E, I, J, K) are not empty. Also, the paper plates (O) cannot be used to carry water since it is too flat. Therefore, the correct answer is that no suitable containers exist.
</p>
<h3>Generated Plans</h3>
<div class="irow">
<div class="tcolumn">
<h3> No VLM </h3>
<p> Thought: I will check if any of the bottles are empty and suitable for containing liquids. </p>
<p> Plan: </p>
<p style="color: #ff0000">
1. Go to object E <br>
2. Pick up object E <br>
3. Bring to human object E <br>
4. Done </p>
</div>
<div class="tcolumn">
<h3> InstructBLIP </h3>
<p> Thought: I can first ask which objects are containers out of all the objects I can carry. Of these containers, I can ask which is empty and can confidently contain liquids. </p>
<p> Question about [A, B, C, D, E, F, G, H, I, J, K, L, M, N, O]: Is this object a container? </p>
<p style="color: #ff9900"> Answer <br>
A: Yes (0.9), No (0.08), Container (0.0) <br>
B: Yes (0.91), No (0.08), Unknown (0.0) <br>
C: Yes (0.85), No (0.13), Unknown (0.0) <br>
D: Yes (0.93), No (0.06), Unknown (0.0) <br>
E: Yes (0.98), No (0.02), Jar (0.0) <br>
F: Yes (0.91), No (0.08), Unknown (0.0) <br>
G: Yes (0.64), No (0.32), Unknown (0.01) <br>
H: Yes (0.89), No (0.08), Unknown (0.01) <br>
I: Yes (0.97), No (0.02), Jar (0.0) <br>
J: Yes (0.98), No (0.02), Jar (0.0) <br>
K: Yes (0.98), No (0.01), Jar (0.0) <br>
L: Yes (0.95), No (0.04), Unknown (0.0) <br>
M: Yes (0.62), No (0.33), Unknown (0.01) <br>
N: Yes (0.91), No (0.07), Bottle (0.0) <br>
O: Yes (0.96), No (0.04), Container (0.0)
</p>
<p> Thought: The objects that are confidently containers are A, B, C, D, E, F, G, H, I, J, K, L, N, and O. I should ask which of these is empty and can confidently contain liquids. </p>
<p> Question about [A, B, C, D, E, F, G, H, I, J, K, L, N, O]: Is this object empty and can confidently contain liquids? </p>
<p style="color: #ff9900"> Answer: <br>
A: Yes (0.76), No (0.21), Unknown (0.01) <br>
B: Yes (0.57), No (0.39), Unknown (0.01) <br>
C: Yes (0.65), No (0.31), Unknown (0.01) <br>
D: Yes (0.6), No (0.36), Unknown (0.01) <br>
E: Yes (0.79), No (0.19), Unknown (0.0) <br>
F: Yes (0.68), No (0.29), Unknown (0.01) <br>
G: No (0.47), Yes (0.46), Unknown (0.03) <br>
H: Yes (0.75), No (0.22), Unknown (0.01) <br>
I: Yes (0.77), No (0.21), Unknown (0.0) <br>
J: Yes (0.77), No (0.21), Unknown (0.0) <br>
K: Yes (0.73), No (0.25), Unknown (0.0) <br>
L: Yes (0.76), No (0.22), Unknown (0.0) <br>
N: Yes (0.73), No (0.24), Bottle (0.0) <br>
O: Yes (0.79), No (0.19), Unknown (0.01)
</p>
<p> Thought: The objects that are empty and can confidently contain liquids are A, B, C, D, E, F, H, I, J, K, L, N, and O. The object with the highest Yes value is E. </p>
<p> Plan: </p>
<p style="color: #ff0000">
1. Go to object E <br>
2. Pick up object E <br>
3. Bring to human object E <br>
4. Done <br>
</p>
</div>
<div class="tcolumn">
<h3> PG-InstructBLIP (ours) </h3>
<p> Thought: I can first ask which objects are empty containers out of all the objects I can carry. Of these containers, I can ask which can confidently contain liquids. </p>
<p> Question about [A, B, C, D, E, F, G, H, I, J, K, L, M, N, O]: Is this object an empty container? </p>
<p style="color: #ff9900"> Answer <br>
A: No (0.51), Yes (0.45), Unknown (0.02) <br>
B: No (0.69), Yes (0.29), Unknown (0.01) <br>
C: No (0.7), Yes (0.28), Unknown (0.01) <br>
D: No (0.85), Yes (0.14), Food (0.0) <br>
E: No (0.77), Yes (0.21), Unknown (0.0) <br>
F: No (0.85), Yes (0.14), Food (0.0) <br>
G: No (0.57), Yes (0.39), Unknown (0.02) <br>
H: No (0.64), Yes (0.34), Unknown (0.01) <br>
I: No (0.85), Yes (0.14), Unknown (0.0) <br>
J: No (0.54), Yes (0.45), Unknown (0.01) <br>
K: No (0.79), Yes (0.2), Unknown (0.0) <br>
L: No (0.65), Yes (0.34), Unknown (0.0) <br>
M: No (0.67), Yes (0.31), Unknown (0.0) <br>
N: No (0.7), Yes (0.27), Unknown (0.01) <br>
O: Yes (0.8), No (0.19), Unknown (0.0)
</p>
<p> Thought: The only object that is confidently an empty container is O. I should ask if it can confidently contain liquids next. </p>
<p> Question about [O]: Can this object confidently contain liquids? </p>
<p style="color: #ff9900"> Answer <br>
O: No (0.76), Yes (0.23), Unknown (0.0)
</p>
<p> Thought: Since object O cannot confidently contain liquids, there are no suitable containers. </p>
<p> Plan: </p>
<p style="color: #38761d">
1. Tell the human that no suitable containers exist <br>
2. Done
</p>
</div>
</div>
</div>
</div>
</body>
</html>