Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
benchmark.py		benchmark.py
demo-matop.c		demo-matop.c
make.sh		make.sh
matop.h		matop.h
matrixTranspose.c		matrixTranspose.c
matrixTranspose_naive.cl		matrixTranspose_naive.cl
matrixTranspose_v1_4x1.cl		matrixTranspose_v1_4x1.cl
matrixTranspose_v1_4x1_colA.cl		matrixTranspose_v1_4x1_colA.cl
matrixTranspose_v1_colA.cl		matrixTranspose_v1_colA.cl
matrixTranspose_v2.cl		matrixTranspose_v2.cl
matrixTranspose_v2_colA.cl		matrixTranspose_v2_colA.cl
matrixTranspose_v2_colA_4x4.cl		matrixTranspose_v2_colA_4x4.cl
matrixTranspose_v2_colA_8x8.cl		matrixTranspose_v2_colA_8x8.cl
matrixTranspose_v3.cl		matrixTranspose_v3.cl
matrixTranspose_v3_3.cl		matrixTranspose_v3_3.cl
matrixTranspose_v3_float16.cl		matrixTranspose_v3_float16.cl
matrixTranspose_v3_float4.cl		matrixTranspose_v3_float4.cl
matrixTranspose_v3_float8.cl		matrixTranspose_v3_float8.cl
matrixTranspose_v3_half4.cl		matrixTranspose_v3_half4.cl

README.md

mat-transpose

Build

$ ./make.sh

Usage

There're some usage examples below:

float16 type

# using command: ./matrixTranspose HEIGHTA WIDTHA KERNEL_FILE_PATH LOOP_EXECUTION_TIMES GLOBAL_WORK_SIZE[0] GLOBAL_WORK_SIZE[1] GLOBAL_WORK_SIZE[2]
$ ./matrixTranspose 2048 2048 ./matrixTranspose_v3_float16.cl 1 $[2048/4] 2048 1
>>> 1 times CPU starting...
CPU 2048 x 2048 0.221085 s 18.971454 MFLOPS

>>> global_work_size[3]: (512, 2048, 1)
[WARN] global work size is smaller than task size.
>>> 1 times ./matrixTranspose_v3_float16.cl starting...
GPU 2048 x 2048 0.105491 s 39.759828 MFLOPS ./matrixTranspose_v3_float16.cl

>>> correct rate: 1.0000
>>> ~ Bingo ~ matrix a == matrix b

float type

# using command: ./matrixTranspose HEIGHTA WIDTHA KERNEL_FILE_PATH LOOP_EXECUTION_TIMES GLOBAL_WORK_SIZE[0] GLOBAL_WORK_SIZE[1] GLOBAL_WORK_SIZE[2]
$ ./matrixTranspose 2048 2048 ./matrixTranspose_v1.cl 1 $[2048*2048] 1 1
>>> 1 times CPU starting...
CPU 2048 x 2048 0.220617 s 19.011699 MFLOPS

>>> global_work_size[3]: (4194304, 1, 1)
[WARN] global work size is smaller than task size.
[WARN] using kernel-v1, the second and third dim of global work size should be one.
>>> new global_work_size[3]: (4194304, 1, 1)
>>> 1 times ./matrixTranspose_v1.cl starting...
GPU 2048 x 2048 0.089062 s 47.094204 MFLOPS ./matrixTranspose_v1.cl

>>> correct rate: 1.0000
>>> ~ Bingo ~ matrix a == matrix b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mat-transpose

mat-transpose

README.md

mat-transpose

Build

Usage

float16 type

float type

Files

mat-transpose

Directory actions

More options

Directory actions

More options

Latest commit

History

mat-transpose

Folders and files

parent directory

README.md

mat-transpose

Build

Usage

float16 type

float type