Skip to content

Commit

Permalink
🚚 move reduce and transpose to root dir
Browse files Browse the repository at this point in the history
  • Loading branch information
Tongkaio committed Jun 4, 2024
1 parent 23bc3ef commit 84d9f2b
Show file tree
Hide file tree
Showing 11 changed files with 20 additions and 11 deletions.
10 changes: 0 additions & 10 deletions example/reduce/README.md

This file was deleted.

File renamed without changes.
20 changes: 20 additions & 0 deletions reduce/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# 归约计算
内容:求给定数组求和

1. device_reduce_v0:仅使用**全局内存**,且 N 必须是 BLOCK_SIZE 的整数倍
2. device_reduce_v1:使用(静态)**共享内存**,不再要求 N 是 BLOCK_SIZE 的整数倍,归约的过程中不会改变全局内存的数据
3. device_reduce_v2:在v1基础上修改,使用(动态)**共享内存**,性能不变
4. device_reduce_v3:在v2基础上修改,通过原子函数,不再需要到CPU上再归约一次

## 结果
N=100000000,BLOCK_SIZE = 128 的测试结果:
```
[reduce_host]: sum = -1209.635986, total_time_h = 388.534760 ms
[reduce_v0]: sum = -22739588.000000, total_time_0 = 31.805029 ms
[reduce_v1]: sum = -1208.930542, total_time_1 = 19.669153 ms
[reduce_v2]: sum = -1208.930542, total_time_2 = 19.637846 ms
[reduce_v3]: sum = -1208.927124, total_time_3 = 15.914701 ms
```

## 参考:
1. cuda编程基础与实践 (樊哲勇)
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
1 change: 0 additions & 1 deletion example/transpose/README.md → transpose/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ shared_memory 优化:
[device_transpose_v3] Average time: (10.317618) ms
```


## 参考
1. cuda编程基础与实践 (樊哲勇)
2. [CUDA笔记-内存合并访问](https://zhuanlan.zhihu.com/p/641639133)
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 comments on commit 84d9f2b

Please sign in to comment.