forked from codeplaysoftware/syclacademy
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
476 lines (455 loc) · 16 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<link rel="stylesheet" href="../common-revealjs/css/reveal.css">
<link rel="stylesheet" href="../common-revealjs/css/theme/white.css">
<link rel="stylesheet" href="../common-revealjs/css/custom.css">
<script>
// This is needed when printing the slides to pdf
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? '../common-revealjs/css/print/pdf.css' : '../common-revealjs/css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<script>
// This is used to display the static images on each slide,
// See global-images in this html file and custom.css
(function() {
if(window.addEventListener) {
window.addEventListener('load', () => {
let slides = document.getElementsByClassName("slide-background");
if (slides.length === 0) {
slides = document.getElementsByClassName("pdf-page")
}
// Insert global images on each slide
for(let i = 0, max = slides.length; i < max; i++) {
let cln = document.getElementById("global-images").cloneNode(true);
cln.removeAttribute("id");
slides[i].appendChild(cln);
}
// Remove top level global images
let elem = document.getElementById("global-images");
elem.parentElement.removeChild(elem);
}, false);
}
})();
</script>
</head>
<body>
<div class="reveal">
<div class="slides">
<div id="global-images" class="global-images">
<img src="../common-revealjs/images/sycl_academy.png" />
<img src="../common-revealjs/images/sycl_logo.png" />
<img src="../common-revealjs/images/trademarks.png" />
</div>
<!--Slide 1-->
<section class="hbox" data-markdown>
## Managing Data
</section>
<!--Slide 2-->
<section class="hbox" data-markdown>
## Learning Objectives
* Learn about the USM and buffer/accessor models for managing data
* Learn how to allocate, transfer and free memory using USM.
* Learn how a buffer synchronizes data
* Learn how to access data in a kernel function
</section>
<section>
<div class="hbox" data-markdown>
#### Memory Models
</div>
<div class="container" data-markdown>
* In SYCL there are two models for managing data:
* The buffer/accessor model.
* The USM (unified shared memory) model.
* Which model you choose can have an effect on how you enqueue kernel functions.
</div>
</section>
<!--Slide 3-->
<!--Slide 3A-->
<section>
<div class="hbox" data-markdown>
#### CPU and GPU Memory
</div>
<div class="container" data-markdown>
* A GPU has its own memory, separate to CPU memory.
* In order for the GPU to use memory from the CPU, the following actions must take place (either explicitly or implicitly):
* Memory allocation on the GPU.
* Data migration from the CPU to the allocation on the GPU.
* Some computation on the GPU.
* Migration of the result back to the CPU.
</div>
<img src="../common-revealjs/images/gpu_cpu_memory.png" height="400">
</section>
<!--Slide 3A-->
<section>
<div class="hbox" data-markdown>
#### CPU and GPU Memory
</div>
<div class="container" data-markdown>
* Memory transfers between CPU and GPU are a bottleneck.
* We want to minimize these transfers, when possible.
</div>
<img src="../common-revealjs/images/gpu_cpu_memory.png" height="400">
</section>
<!--Slide 3B-->
<section>
<div class="hbox" data-markdown>
#### USM Allocation Types
</div>
<div class="container" data-markdown>
* There are different ways USM memory can be allocated: host, device and shared.
![SYCL](../common-revealjs/images/Figure6-1bookUSMtypes.png "SYCL")
(from book)
</div>
</section>
<!--Slide 3-->
<section>
<div class="hbox" data-markdown>
#### Using USM - Malloc Device
</div>
<div class="container">
<code class="code-100pc"><pre>
// Allocate memory on device
T *device_ptr = sycl::malloc_device<T>(n, myQueue);
// Copy data to device
myQueue.memcpy(device_ptr, cpu_ptr, n * sizeof(T));
// ...
// Do some computation on device
// ...
// Copy data back to CPU
myQueue.memcpy(result_ptr, device_ptr, n * sizeof(T)).wait();
// Free allocated data
sycl::free(device_ptr, myQueue);
</code></pre>
</div>
<div class="container" data-markdown>
* It is important to free memory after it has been
used to avoid memory leaks.
</div>
</section>
<section>
<div class="hbox" data-markdown>
#### Using USM - Malloc Shared
</div>
<div class="container">
<code class="code-100pc"><pre>
// Allocate shared memory
T *shared_ptr = sycl::malloc_shared<T>(n, myQueue);
// Shared memory can be accessed on host as well as device
for (auto i = 0; i < n; ++i)
shared_ptr[i] = i;
// ...
// Do some computation on device
// ...
// Free allocated data
sycl::free(shared_ptr, myQueue);
</code></pre>
</div>
<div class="container" data-markdown>
* Shared memory is accessible on host and device.
* Performance of shared memory accesses may be poor depending on platform.
</div>
</section>
<section>
<div class="hbox" data-markdown>
#### SYCL Buffers & Accessors
</div>
<div class="container" data-markdown>
* SYCL provides an API which takes care of allocations and `memcpy`s, as well as some other things.
</div>
</section>
<section>
<div class="hbox" data-markdown>
#### SYCL Buffers & Accessors
</div>
<div class="container" data-markdown>
* The buffer/accessor model separates the storage and access of data
* A SYCL buffer manages data across the host and any number of devices
* A SYCL accessor requests access to data on the host or on a device for a specific SYCL kernel function
* Accessors are also used to access data within a SYCL kernel function
* This means they are declared in the host code but captured by and then accessed within a SYCL kernel function
</div>
</section>
<!--Slide 4-->
<section>
<div class="hbox" data-markdown>
#### SYCL Buffers & Accessors
</div>
<div class="container">
<div class="col" data-markdown>
* A SYCL buffer can be constructed with a pointer to host memory
* For the lifetime of the buffer this memory is owned by the SYCL runtime
* When a buffer object is constructed it will not allocate or copy to device memory at first
* This will only happen once the SYCL runtime knows the data needs to be accessed and where it needs to be accessed
</div>
<div class="col" data-markdown>
![Buffer Host Memory](../common-revealjs/images/buffer-hostmemory.png "Buffer Host Memory")
</div>
</div>
</section>
<!--Slide 5-->
<section>
<div class="hbox" data-markdown>
#### SYCL Buffers & Accessors
</div>
<div class="container">
<div class="col" data-markdown>
* Constructing an accessor specifies a request to access the data managed by the buffer
* There are a range of different types of accessor which provide different ways to access data
</div>
<div class="col" data-markdown>
![Buffer Host Memory Accessor](../common-revealjs/images/buffer-hostmemory-accessor.png "Buffer Host Memory Accessor")
</div>
</div>
</section>
<!--Slide 6-->
<section>
<div class="hbox" data-markdown>
#### SYCL Buffers & Accessors
</div>
<div class="container">
<div class="col" data-markdown>
* When an accessor is constructed it is associated with a command group via the handler object
* This connects the buffer that is being accessed, the way in which it’s being accessed and the device that the command group is being submitted to
</div>
<div class="col" data-markdown>
![Buffer Host Memory Accessor CG](../common-revealjs/images/buffer-hostmemory-accessor-cg.png "Buffer Host Memory Accessor CG")
</div>
</div>
</section>
<!--Slide 7-->
<section>
<div class="hbox" data-markdown>
#### SYCL Buffers & Accessors
</div>
<div class="container">
<div class="col" data-markdown>
* Once the SYCL scheduler selects the command group to be executed it must first satisfy its data dependencies
* If necessary, this includes allocating and copying the data to the device accessing that data
* If the most recent copy of the data is already on the device then the runtime will not copy again
</div>
<div class="col" data-markdown>
![Buffer Host Memory Accessor CG Device](../common-revealjs/images/buffer-hostmemory-accessor-cg-device.png "Buffer Host Memory Accessor CG Device")
</div>
</div>
</section>
<!--Slide 8-->
<section>
<div class="hbox" data-markdown>
#### SYCL Buffers & Accessors
</div>
<div class="container">
<div class="col" data-markdown>
* Data will remain in device memory after kernels finish executing until another accessor requests access in a different device or on the host
* When the buffer object is destroyed it will wait for any outstanding work that is accessing the data to complete and then copy back to the original host memory
</div>
<div class="col" data-markdown>
![Buffer Destroyed](../common-revealjs/images/buffer-destroyed.png "Buffer Destroyed")
</div>
</div>
</section>
<!--Slide 9-->
<section>
<div class="hbox" data-markdown>
#### SYCL Buffers & Accessors
</div>
<div class="container">
<code class="code-100pc"><pre>
T var = 42;
{
// Create buffer pointing to var.
auto buf = sycl::buffer{&var, sycl::range<1>{1}};
// ...
// Do some computation on device. Use accessors to access buffer
// ...
} // var updated here
assert(var != 42);
</code></pre>
</div>
<div class="container" data-markdown>
* A buffer is associated with a type, range and
dimensionality. Dimensionality must be either 1, 2 or
3.
* Usually type and dimensionality can be inferred using
CTAD.
* If a buffer is associated with some allocation in host
memory, the host memory will be updated only once the
buffer goes out of scope.
</div>
</section>
<!--Slide 10-->
<section>
<div class="hbox" data-markdown>
#### Accessor class
</div>
<div class="hbox" data-markdown>
![Accessor Types](../common-revealjs/images/accessor-types.png "Accessor Types")
</div>
</section>
<!--Slide 12-->
<section>
<div class="hbox" data-markdown>
#### Accessor class
</div>
<div class="container" data-markdown>
* There are many different ways to use the `accessor`
class.
* Accessing data on a device.
* Accessing data immediately in the host application.
* Allocating local memory.
* For now we are going to focus on accessing data on a
device.
</div>
</section>
<!--Slide 13-->
<section>
<div class="hbox" data-markdown>
#### Constructing an accessor
</div>
<div class="container">
<code class="code-100pc"><pre>
auto acc = sycl::accessor{bufA, cgh};
</code></pre>
</div>
<div class="container" data-markdown>
* There are many ways to construct an `accessor`.
* The `accessor` class supports CTAD so it's not
necessary to specify all of the template arguments.
* The most common way to construct an `accessor` is from
a `buffer` and a `handler` associated with the command
group function you are within.
* The element type and dimensionality are inferred from
the `buffer`.
* The `access_mode` is defaulted to
`access_mode::read_write`.
</div>
</section>
<!--Slide 14-->
<section>
<div class="hbox" data-markdown>
#### Specifying the access mode
</div>
<div class="container">
<code class="code-100pc"><pre>
auto readAcc = sycl::accessor{bufA, cgh, sycl::read_only};
auto writeAcc = sycl::accessor{bufB, cgh, sycl::write_only};
</code></pre>
</div>
<div class="container" data-markdown>
* When constructing an `accessor` you will likely also
want to specify the `access_mode`
* You can do this by passing one of the CTAD tags:
* `read_only` will result in `access_mode::read`.
* `write_only` will result in `access_mode::write`.
</div>
</section>
<!--Slide 15-->
<section>
<div class="hbox" data-markdown>
#### Specifying no initialization
</div>
<div class="container">
<code class="code-100pc"><pre>
auto acc = sycl::accessor{buf, cgh, sycl::no_init};
</code></pre>
</div>
<div class="container" data-markdown>
* When constructing an `accessor` you may also want to
discard the original data of a `buffer`.
* You can do this by passing the `no_init` property.
</div>
</section>
<!--Slide 16-->
<section>
<div class="hbox" data-markdown>
#### Using Accessors
</div>
<div class="container">
<code class="code-100pc"><pre>
T var = 42;
{
// Create buffer pointing to var.
auto bufA = sycl::buffer{&var, sycl::range<1>{1}};
auto bufB = sycl::buffer{&var, sycl::range<1>{1}};
q.submit([&](sycl::handler &cgh) {
auto accA = sycl::accessor{bufA, cgh, sycl::read_only};
auto accB = sycl::accessor{bufA, cgh, sycl::no_init};
cgh.single_task<mykernel>(...); // Do some work
});
} // var updated here
assert(var != 42);
</code></pre>
</div>
<div class="col-right-2" data-markdown>
* Buffers and accessors take care of memory
migration, as well as dependency analysis.
* More to come later on dependencies.
</div>
</section>
<!--Slide 19-->
<section>
<div class="hbox" data-markdown>
#### operator[]
</div>
<div class="container">
<code class="code-100pc"><pre>
gpuQueue.submit([&](handler &cgh){
auto inA = sycl::accessor{bufA, cgh, sycl::read_only};
auto inB = sycl::accessor{bufB, cgh, sycl::read_only};
auto out = sycl::accessor{bufO, cgh, sycl::write_only};
cgh.single_task<mykernel>([=]{
<mark>out[0] = inA[0] + inB[0];</mark>
});
});
</code></pre>
</div>
<div class="container" data-markdown>
* As well as specifying data dependencies an `accessor`
can also be used to access the data from within a kernel
function.
* You can do this by calling `operator[]` on the
`accessor`.
* `operator[]` for USM pointers must take a `size_t`,
whereas `operator[]` for accessors can take a
multi-dimensional `sycl::id` or a `size_t`.
</div>
</section>
<!--Slide 20-->
<section>
<div class="hbox" data-markdown>
## Questions
</div>
</section>
<!--Slide 21-->
<section>
<div class="hbox" data-markdown>
#### Exercise
</div>
<div class="container" data-markdown>
Code_Exercises/Managing_Data/source
</div>
<div class="container" data-markdown>
Implement a SYCL application that adds two variables
and returns the result using:
</div>
<div class="container" data-markdown>
1. The USM memory model
2. The buffer/accessor memory model.
</div>
</section>
</div>
</div>
<script src="../common-revealjs/js/reveal.js"></script>
<script src="../common-revealjs/plugin/markdown/marked.js"></script>
<script src="../common-revealjs/plugin/markdown/markdown.js"></script>
<script src="../common-revealjs/plugin/notes/notes.js"></script>
<script>
Reveal.initialize();
Reveal.configure({ slideNumber: true });
</script>
</body>
</html>