-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
170 lines (137 loc) · 5.28 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
<html>
<head>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-101184475-4"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-101184475-4');
</script>
<meta name=viewport content='width=700'>
<meta name="mobile-web-app-capable" content="yes">
<meta name="theme-color" content="#000000">
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=no">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="HandheldFriendly" content="true">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta Author="arjunbazinga"
<meta topics="Reinforcement learning, humans, AI,">
</head>
<title>How do "you" model people ?</title>
<body>
<style>
body {
margin: auto;
background: #fffde8;
padding: 10px;
}
::-moz-selection {
background: yellow;
}
::selection {
background: yellow;
}
textarea{
margin: auto;
min-width: 95%;
max-width: 100%;
min-height: 20%;
border-radius: 6px;
box-shadow: 2px 2px 8px rgba(black, .3);
border: 1
}
@media (prefers-color-scheme: light) {
body {
background-color: #fffde8;
color: black;
}
}
@media (prefers-color-scheme: dark) {
body {
background-color: black;
color: #fffde8;
}
}
</style>
<div>
<h2>How do you model people ?</h2>
</div>
<div>
<br>
Here is a game you can play to figure out how you model people.<br>
Play each agent for as many turns as you like.<br>
On every turn the agent will put a reward in one of the boxes and a penalty in the other<br>
Your job then is to guess which box has the reward, and hence collect as much reward as you can.<br>
<b>Hint</b>: For each agent try coming up with a strategy to achieve a big average reward.<br>
</div>
<br>
<br>
<div style="display:inline-block">
<div>
Pick an Agent:<br>
<input type="radio" Name="Agent" id="Agent1" checked>Agent 1</input><br>
<input type="radio" Name="Agent" id="Agent2">Agent 2</input><br>
<input type="radio" Name="Agent" id="Agent3">Agent 3</input><br>
</div>
</div>
<br>
<div stle="padding:50px position: relative margin: auto height: 10vh width: 10vw">
<canvas id="myChart"></canvas>
</div>
<div>
Select a box:
<input type="button" id="box1" value="Box 1">
<input type="button" id="box2" value="Box 2">
<div id="probs", style="display:none">
Predicted probs:
<div id="p"></div>
</div>
</div>
<br>
<div>
<input type="button" id="reset" value="Reset Everything">
<br>
<br>
I <b>strongly</b> recommend filling the box below before clicking Reveal.<br> <br>
<textarea id="des"></textarea>
<br>
<div id="revealed" style="display:none">
<p>
Agent 1 is Random it puts the reward in the two boxes with equal probability.
<br> Agent 2 is Evil or, as the cool kids say it Adversarial, it tries to guess which box you're going to click
and then puts the reward in the other box.
<br> Agent 3 is Good, it tries to put the reward in the box it thinks you're goning to click.
<br>
</p>
<p>
So how do Agent 2 and 3 predict what you're going to click on ?
<br> This implementation closely follows the paper
<a href="https://arxiv.org/abs/1711.09883">
AI Safety Gridworlds</a> in which the authors use simple
<a href="https://en.wikipedia.org/wiki/">Exponential Smoothing</a> to assing probabilities
to each box.<br>
Try adjusting the learning rate here:
<input type="number" id="lr" , value=0.25>
<input type="button" , id="lrc" , value="Update">
<br>
You'll see the closer the learning rate is to 1 the more the prediction will be affected by what happend in the recent
past.<br><br>
After calculating the probabilities the Good agent just selects the box with the higher probability and the Evil agent does the opposite.
This simple enviroment is supposed to test how different RL algorithms modle different kinds of agents.
</p>
<br>
</div>
<input type="button" id="reveal" value="Reveal">
</div>
<br>
<script src="Chart.min.js"></script>
<script src="fof.min.js"></script>
<script>reset()</script>
<div style="text-align: center">
<br>
<br>
<a href="https://twitter.com/arjunsriv">@arjunsriv</a>
</div>
</body>
</html>