-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnotes.txt
195 lines (148 loc) · 8.11 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
Useful property: local state depends on a radius-n ball
(up/left/right) from t turns ago; e.g., can quickly check if a move
would result in robot squish. Not quite as helpful for efficient map
storage, as arbitrarily much stuff can be in motion at once.
Problem: global properties matter; see, e.g., example 6. Arbitrarily
complex falling rock stuff can break -- or make -- connectivity. Can
track blame per-square for board state, but a little worried about
space consumption.
Further problem: how do we tell when we've broken lambda paths? Could
keep shortest-path state, erase stuff leading out of a new rock, then
recompute at blank spaces & fixpoint. Again, storage (although the
incremental rebuild would also work backwards).
Temptation: try to breed GAs to pattern-recognize this stuff. Could
do more than just blind feedback -- monitor what happened when and be
more Lamarckian. This has the benefit of being more fun than trying
to write a better Astar or whatever than people who work for Google.
Subproblem: what primitives to give them? They need to be cheap, but
burning time stumbling around wildly is also not helpful. Maybe I'll
give them undo as an action; let them run for some number of moves and
keep the least bad ending.
Strategic undo: How to find a useful implication point to back up to?
Don't want to keep bumbling around in a broad delta of suck. Could
try to pattern-recognize checkpoints and undo points.
Metagame: am I hoping to run some evolution in response to the map? I
think so.
Pattern idea:
simple = [offset -> spacepred]
fancy = simple | simple * (offsetnz * simple) * fancy
Want something that can recognize repeating patterns in one shot --
e.g., can we get out from under this rock shelf, or is it death? On
the other hand, if we have a powerful enough undo, trying to
approximate the future like that might be more trouble than it's worth.
The important goal here, really, is to have something that's amenable
to mutation, and, especially, crossover. (Let's not repeat 2003's
mistakes.)
Now, the extra-fun question: will I ever want instant redo? I'm
thinking the answer is "why not?" (The other answer is that doing
reversible diff lists is being a "fun" way to learn how to use Rust's
region/typestate system.)
====
Thoughts on the fake-applicative array: would be nice to deforest some
of those intermediate nodes, especially because there are dead stores
in them and we might no longer even be keeping all the intermediate
steps for the real thing.
Fun thing I probably shouldn't try to do and might break everything:
write an unsafe function to dynamically check a shared box's refcount
and expose it as a unique/stack box if it's 1. Sufficient version for
this: use the refcount in an advisory way.
But also: can compact if things are absolutely small, or if doing so
makes them not too much larger, or some other heuristic. This should
all wait until I have something working, of course.
(Or even: when focusing, collect everything up to the old root, and
then swap. That might be a win depending on the usage patterns, but
it's getting late.)
====
Anyway. Ideas floating around in my head about a checkpoint cache,
and casting random paths off of them, and saving them on lambdas and
maybe randomly otherwise, and giving them decaying utilities and so
on.
Collection of checkpoints with max size and sum of weights normalized
(or virtually normalized by maintaining sum); pick one randomly (by
weight?)... and then what? Cast "standard paths"; yields collection
of new checkpoints. Obviously want to score up ones that hit lambdas
(& consider how useful short paths might be), but lambdas followed by
death is less good. Can't(?) dump them all into the pot; pick some
number by lot and add them with their scores, or something.
Consider some level of local search... ah. If fewer than N available
points, we'll just add them all and remove the original. Or, in
general, if we add k/n of the checkpoints, lower parent's score by
k/n. (The root is, of course, outside everything and unremoveable.)
(May want to memoize checkpoints by string or something; or just not
care.)
Other things: quick check for whether a move is valid. Rearrange the
"touched" to include whether/number rocks moved, to see whether "wait"
is useful. (Or the entire point vector, but maybe that's too much
memory? And I don't think we actually need that.)
Just single-direction rays, I think. Ray in same direction as parent
is almost certainly not useful and could just be skipped. Ray in
opposite direction is also probably not useful. Could just skip
casting them with high probability.
Lite version: vector of states (posns); pick one and a useful move at
random; add result to end or replace random not-zero.
Write first: bot shell, that.
====
Agenda for Sunday: make submission work. Then either the rule
extensions or the less awful bot.
In particular, selection pressure should be a win all by itself.
====
So the weighted collection heap thing is not the best idea; it's slow
and I'm feeing uninspired to work on heavyweight "mutations" to make
it maybe work okay.
Now, the original randbot: suppose, when a new state has found a
lambda, we store it several times. Behold, selection pressure. Sort
of. And, relatively easy but of uncertain value: run multiple steps
before writeback.
Consider also: if there's one useful move, overwrite yourself with it
-- or, if it's death, then bag[0]. Yet further: if there's one useful
move, then just take it -- loop taking steps until it wasn't 1. How
common is this?
Maybe too fancy: treat each thread as a backtracking search with
random decisions; having 1 forward choice gives you an extra turn;
having 0 means you undo; having several means you replicate.
Thing I didn't do and should: promote forwards over to the side, and
undemote reversal after lambda (or other fetch), maybe. (But also: if
taking multiple steps per fork, may not want to bias forwards between
different moves. So maybe skip that?)
Okay. Thing that does need to be done is the evolutionarity.
Generalize it: hitting anything not empty or earth gives a replication
bonus. Encourages (or helps explore space after) the stuff in the
expansions, if I ever implement them.
====
Something that would be neat but I probably won't do: the
hashed-state-cache search thing that someone whose name I forgot used
to good effect in 2003. It's not so useful here because doing
something wrong doesn't result in immediate death; it just makes you
wander around lost in a huge state space of pointlessness. Also I
don't have anything like a good search heuristic.
It's interesting in that it inherently memoizes the redundant
computation I'll get when I add in replication.
Although... back up. Suppose I did hashes instead of random numbers
for the new slots. Could even loop over source slots instead of
selecting randomly (does that change the variance?), but whatever. To
"replicate", give the cells utilities that decrement on use, but
aren't inherited.
This all seems like it'd have greediness problems, but whatever. Such
is life.
====
SCIENCE: large replication bonuses for lambdas seem to be not helpful.
Replication bonus for rock pushing is actually counterproductive,
because of the other rock on contest5.
Want to try a bonus for exploration: if the last thing on your
position mod (8,8) or something wasn't at that position, then you get
an extra copy. Finite space, maybe rewards more than once if the
first one gets killed, etc. Might encourage multiple subpopulations
at locations that alias; not sure how I feel about that. (Also might
want an odd number for maze maps, or to jitter the position before
lookup to break up 1-square hallway aliasing. Or maybe just don't
care.)
I'll give up and start implementing the extensions soon; at least then
I can do mediocrely on all the maps.
More science! Exploration bonus idea fails.
====
So yeah, doing moves in batches and replicating at the end seems to be
a good thing.
I can't help the feeling that it's not a quality-of-heuristic thing so
much as just having more raw speed that way, from not bouncing around
the difference graph as much. I guess I could measure, but that might
make me sad.