-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
322 lines (269 loc) · 13.8 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
% Program Overview
% Tyson Whitehead
Notation
========
* The suffix *s* is added multiple times to indicate nestings of
plurals (e.g., individualss would be a collection of a collection of
individuals).
* the prefix *r* is used to indicate the reduced versions of data
(e.g., rindividualss would be a collection of a collection of
individual reduction data)
System
======
The system consists of space, world, variety, and individual data:
space
~ the size of the world and what boundaries are periodic,
world
~ global user defined parameters (e.g., current simulation year, etc.),
variety
~ per variety user defined parameters (e.g., growth rate, etc.), and
individual
~ per individual user defined parameters (e.g., height, etc.).
The world, variety, and individual data is conceptually organized into
a three level tree (i.e., one world, an arbitrary number of varieties
for this world, and an arbitrary number of individuals for each of
these varieties). The parts are maintained separately to minimize
mutation operations. The world doesn't contain the varieties, nor do
the varieties contain the individuals. Instead, there is a world,
there is a collection of varieties, and there is a collection of a
collection of individuals:
world
~ the one world
varieties
~ the collection (for the world) of variety
individualss
~ the collection (for the world) of collections (for the varieties)
of individuals
Simulation
==========
To start things, the space and world are initialized, an arbitrary
number of varieties are created, and, for each variety, an arbitrary
number of individuals are created. This gives a space and a (world,
varieties, individualss) tuple. These two represent the simulation at
any one moment in time. The simulation progresses by alternating
between performing reduction and generation across the current (world,
varieties, and individualss) tuple.
1. The reduction generates the (rworld, rvarieties, rindividuals_in,
rindividuals_out) tuple (e.g., highest individual in a variety,
number of individuals in a variety, average height of individuals
around each individual, etc.) from the current (world, varieties,
individualss) tuple.
2. The generation creates a new (world, varieties, individualss) tuple
(e.g., growing an individual by replacing it with an identical but
higher individual in the same spot, spawning new individuals of the
same variety but smaller, etc.) from the current (world, varieties,
individualss) and (rworld, rvarieties, rindividuals_in,
rindividuals_out) tuples.
Reduction
=========
Reduction is performed across across the (world, varieties,
individualss) to extract the (rworld, rvarieties, rindividualss)
parameters of interest.
rworld
~ reduction across all (world, variety, individual) tuples
rvarieties
~ collection of per variety reductions across all (world, variety,
individual) tuples
rindividualss_in
~ collection of collections of per individual reductions across all
(world, variety, individual) tuples where the later individual is
in the the former individual's in range
rindividualss_out
~ collection of collections of per individual reductions across all
(world, variety, individual) tuples where the former individual is
in the the later individual's out range
The reductions are initialized by calling a user provided function
with the data available at that level. Another user provided function
is then called to update this data for each relevant (world, variety,
individual) tuple below that level.
rworld
------
* reduced across all (world, variety, individual) tuples at the world
level (i.e., a single instance),
* rworld is initialized by calling a user provided function with
(world),
* rworld is updated by repeatedly calling a user provided function
with each (world, variety, individual) tuple, and
* can be used to calculate desired world level statistics (e.g.,
highest individual in the simulation).
rvarieties
----------
* reduced across all (world, variety, individual) tuple at the variety
level (i.e., a single instance for each variety),
* each rvariety is initialized by calling a user provided function
with the (world, variety) tuple for the variety,
* each rvariety is updated by repeatedly calling a user provided
function with each (world, variety, individual) tuple such that
(world, variety) is the tuple for the variety, and
* can be used to calculate desired variety level statistics across all
individuals of that variety (e.g., highest individual for each
variety).
rindividualss_in
----------------
* reduced across all (world, variety, individual0, individual1) tuples
at the individual level (i.e., a single instance for each
individual),
* each rindividual_in is initialized by calling a user provided
function with the (world, variety, individual) tuple for the
individual,
* the in region for each individual is given by a user provided function
passed the (world, variety, individual) tuple for the individual,
* each rindividual_in is updated by repeatedly calling a user provided
function with each (world, variety0, variety1, individual0,
individual1) tuple such that (world, variety0, individual0) is the
tuple for the individual and (world, variety1, individual1) is the
tuple for the other individual in that individual's in range, and
* can be used to calculate desired individual level statistics across
all individuals encompassed by each individual's in range (e.g.,
tallest individual encompassed by the in range of each individual).
rindividualss_out
-----------------
* reduced across all (world, variety, individual0, individual1) tuples
at the individual level (i.e., a single instance for each individual)
* each rindividual_out is initialized by calling a user provided
function with the (world, variety, individual) tuple for the
individual,
* the out region for each individual is given by a user provided function
passed the (world, variety, individual) tuple for the individual,
* each rindividual_out is updated by repeatedly calling a user
provided function with each (world, variety0, variety1, individual0,
individual1) tuple such that (world, variety0, individual0) is the
tuple for the other individual whose out range the individual is in
and (world, variety1, individual1) is the tuple for the individual,
and
* can be used to calculated desired individual level statistics across
all individuals whose out range encompasses each individual (e.g.,
tallest individual whose out reach encompasses each individual).
Generation
==========
The next step is generated from outside in. Each (individual) is
mapped to a list of new individuals; each (variety, individuals) tuple
is mapped to a list of new (variety, individuals); and the (world,
varieties, individualss) tuple is mapped to a new (world, varieties,
individualss) tuple.
For the individual and variety levels the corresponding component
in the final (world, varieties, individualss) can be
1. deleted by returning an empty list,
2. left alone by returning a list with the a copy of the input,
3. updated by returning a list with an updated copy of the input,
4. replaced with an arbitrary number of new ones by returning
a list of the replacement elements.
As there is only a single world returned, the world level only
supports the second and third of these.
individual
----------
* each individual is mapped to a list of new individuals by calling a
user defined function with the (world, variety, individual,
rworld, rvariety, rindividual_in, rindividual_out) tuple for the
individual,
* for each variety, the combined collection of new individuals and
from this mapping of current individuals becomes the set of
individuals mapped with that variety at the variety level
variety
-------
* each variety and its associated individuals (produced by the above
individual mappings) are mapped to a list of new varieties and
associated individualss by calling a user defined function with
the (world, variety, individuals, rworld, rvariety) tuple for the
variety
* the combined collection of new varieties and associated individualss
from this mapping of current varieties and associated individualss
become the set of varieties and associated individualss mapped with
the world at the world level
world
-----
* the world and its associated varieties and individualss (produced
above by the variety mappings) are mapped to a new world and
associated varieties and individualss by calling a user defined
function with the (world, varieties, individualss, rword) tuple
* the new world and associated varieties and individualss from this
mapping becomes the new (world, varieties, individualss) tuple
Complexity
==========
The most complex part of the program is performing the individual to
individual reductions efficiently. A double loop across all
individuals (i.e., for each individual check every other individual to
see it it falls in the desired region) would be $\mathcal O(N^2)$,
were $N$ is the total number of individuals.^[Saying $f(x)$ is
$\mathcal O(g(x))$ means $\limsup_x f(x)/g(x) < \infty$ (i.e., $f(x)$
is eventually bound by constant multiple of $g(x)$).]
To avoid this, the system maps the individuals to their location along
a space filling 1D Z-curve and stores them in this sorted order. The
mapping to a Z-curve is given by interleaving the binary digits of the
coordinates. A key property of this mapping is that the Z-value of
the upper-left and lower-right coordinates of a rectangular region
bound the Z-values of all points within that region.
Combined with storing individuals in sorted order by their Z-value,
this allows all the individuals in a rectangular region to be located
in better than $\mathcal O(N)$. The algorithm works by alternating
between Z-values and individuals because not all Z-values between the
lower and upper and bounds (the Z-values of the upper-left and
lower-right corners of the region) fall within the region.
- The first Z-value is the upper-left corner of the region (the
Z-value of the upper-left corner of rectangular region is the the
smallest Z-value within that region).
- The first individual with a greater or equal Z-value is then located
(this is $\mathcal O(\mathrm{ln}(N))$ due to the individuals
being sorted by Z-value).
- This located individual and all subsequent ones are processed until
the first individual whose Z-value does not fall in the region is
located.
- If this individual's Z-value falls outside of the upper-bound (the
Z-value of the lower-right corner) then all individuals have been
located.
- Otherwise the next point at which the Z-curve intersects the
region is calculated and the process resumes with locating the
first individual with a greater or equal Z-value.
The number of iterations in the above algorithm is a combination of
the number of individuals in the region plus the number of times the
algorithm switches between being inside and outside the the region.
The later is bound by the number of times the Z-curve enters and
exists the region. This, in turn, is bound by a multiple (a function
of the precision of the Z-curve) of the region's boundary.
Assuming the number of individuals in a region is proportional to the
area of the region, the former (the number of individuals in the
region) will dominate the later (the number of times the Z-curve
enters and exits the region). Assuming some upper bound on the size
of all regions, it follows that finding all individuals in a given
region is $\mathcal O(\mathrm{ln}(N))$.
The complexity of finding all individuals in the (globally bounded)
regions about all individuals is then $\mathcal O(N \mathrm{ln}(N))$.
The sorting of all individuals by their Z-values (required for the
algorithm) is also $\mathcal O(N \mathrm{ln}(N))$. It follows that
the overall complexity is $\mathcal O(N \mathrm{ln}(N))$.
Future
======
The current system can't handle overlapping in and out regions where
the individuals themselves fall outside of both regions (i.e.,
individuals are located at a point within their regions so it is
possible for the overlap of the regions to not include the
individuals). The out version is also less desirable because it is a
scatter operation. This forces simultaneous reducers to synchronize
or maintain separate reduction space for later merger.
Replacing the individual_in and individual_out aggregation/reductions
with just a single individual reduction to address these issues.
individual
----------
* reduced across all (world, variety, individual0, individual1) tuples
at the individual level (i.e., a single instance for each individual)
* each rindividual is initialized by calling a user provided function
with the (world, variety, individual) tuple for the individual,
* the region for each individual is given by a user provided function
passed the (world, variety, individual) tuple the individual,
* each rindividual is updated by repeatedly calling a user
provided function with each (world, variety0, variety1, individual0,
individual1) tuple such that (world, variety0, individual0) is the tuple
for the individual and (world, variety1, individual1) is the tuple
for the other individual whose range overlaps, and
* can be used to calculated desired individual level statistics across
all individuals whose ranges overlap with each individual (e.g.,
tallest individual whose reach overlaps).
The precision of the Z-curve plays a critical role in determining the
constant factor in the overall $\mathcal O(N \mathrm{ln}(N))$
complexity. Less precise Z-values mean less stepping in and out of
regions (and thus a smaller constant factor), but they also mean the
Z-value filtering is more crude so more individuals are processed as
being potentially in a region. Early profile results indicate that
there could be significant performance gains to be made by reducing
the current precision of the Z-values to obtain some optimal balance
between these two effects.