"ST.E.SYS" is not available in SASS instructions table #1

Corle-hyz · 2021-12-06T07:49:24Z

In order to repeat the result of the paper(SC'21), I make following operations at polybenchGpu/CUDA/2DCONV:

module load cuda_flux.
Now the cuda, llvm/11.0 and cuda_flux have been loaded in module.
clang_cf++ --cuda-gpu-arch=sm_70 -std=c++11 -lcudart 2DConvolution.cu -o 2DConvolution.
There are some warnings such as implicit conversion from 'int' to 'float', but the compilation also completes. After compilation, there are 4 new files, they are 2DConvolution, 2DConvolution.cu_fc8b3d24.bc, 2DConvolution.cu_fc8b3d24.out, and 2DConvolution.cu_fc8b3d24.ptx.
There is the first question: I have no idea how to generate the ptx_trace/. We have the single *.ptx, but ptx_trace/ was not found, and I didn't find any direction at README.
LD_PRELOAD=/home/PPT-GPU/tracing_tool/tracer.so ./2DConvolution
After this we can get app_config.py, PTX_Analysis.yml, bbc.txt, memory_traces/ and sass_traces/. This step is much slower than the result in paper, I use more than 8 minites to gain the sass traces.
python3 /home/PPT-GPU/ppt.py --app /home/polybench/polybenchGpu-master/CUDA/2DCONV/ --sass --config TITANV --granularity 2
There is the second quetion: when I excute this command, I receive an error "ST.E.SYS" is not available in SASS instructions table and then the program exits.
I look up the PPT-GPU source code and find that there is no process code for ST.E.SYS, I don't know how to figure it out. Actually, when I profile my own applition, the sass_trace also contains ST.E.SYS.

The hardware I used is Tesla V100.

The text was updated successfully, but these errors were encountered:

Corle-hyz · 2021-12-06T07:52:56Z

These are the screenshots of problem:

yehiaArafa · 2021-12-06T18:26:29Z

Hi,

(2) If you are trying to get the PTX traces, please compile with -03 and run the application. You should have the ptx_traces folder after the application finishes running. It looks like you have only compiled without actually running it.
ex:
clang_cf++ --cuda-gpu-arch=sm_70 -std=c++11 -L/software/GPU/Cuda/10.2/lib64 -03 -lcudart 2DConvolution.cu -o 2DConvolution

If you are want to work on the SASS, you can skip the LLVM step.

(3) The memory traces extraction takes some time, depending on the size of the problem. We haven't reported the time taken by each application for trace extraction. The time noted in the paper is for the predictions.

(4) ST.E.SYS is a generic memory. I think convolution 2d from Polybench shouldn't have a generic memory operation. How are you building the application?

Corle-hyz · 2021-12-07T02:14:14Z

Thanks for your reply.

(2) I redo the step (2) above by adding -O3 and -L/usr/local/cuda/lib64 options, then use ./2DConvolution to run the compiled application, but there's still no ptx_traces/ folder.

(3) Solved.
It seems that I misunderstand the meaning of time in the paper, sorry for bothering.

(4) Solved.
By adding -O3 and -L/usr/local/cuda/lib64 options to re-compile, ST.E.SYS disappears from the sass traces, and I can sucessfully run the PPT-GPU to predict now. Thanks very much!

The text below records the situation when I redo step (2):

root@907effa70e67 2DCONV]# ls
2DConvolution.cu  2DConvolution.cuh  Makefile

[root@907effa70e67 2DCONV]# clang_cf++ --cuda-gpu-arch=sm_70 -std=c++11 -L/usr/local/cuda/lib64 -O3 -lcudart 2DConvolution.cu -o 2DConvolution
+ clang++ -Xclang -load -Xclang /opt/cuda-flux/lib/libcuda_flux_pass.so -finline-functions --cuda-gpu-arch=sm_70 -std=c++11 -L/usr/local/cuda/lib64 -O3 -lcudart 2DConvolution.cu -o 2DConvolution
2DConvolution.cu:65:28: warning: implicit conversion from 'int' to 'float' changes value from 2147483647 to 2147483648 [-Wimplicit-const-int-float-conversion]
                        A[i][j] = (float)rand()/RAND_MAX;
                                               ~^~~~~~~~
/usr/include/stdlib.h:128:18: note: expanded from macro 'RAND_MAX'
#define RAND_MAX        2147483647
                        ^~~~~~~~~~
2DConvolution.cu:138:2: warning: 'cudaThreadSynchronize' is deprecated [-Wdeprecated-declarations]
        cudaThreadSynchronize();
        ^
/usr/local/cuda/targets/x86_64-linux/include/cuda_runtime_api.h:957:8: note: 'cudaThreadSynchronize' has been explicitly marked deprecated here
extern __CUDA_DEPRECATED __host__ cudaError_t CUDARTAPI cudaThreadSynchronize(void);
       ^
/usr/local/cuda/targets/x86_64-linux/include/cuda_runtime_api.h:238:42: note: expanded from macro '__CUDA_DEPRECATED'
#define __CUDA_DEPRECATED __attribute__((deprecated))
                                         ^
CUDA Flux: Instrumenting device code...
CUDA Flux: Module prefix: 2DConvolution.cu_fc8b3d24
CUDA Flux: Working on kernel: _Z20convolution2D_kerneliiPfS_
CUDA Flux: BlockCount: 4
2 warnings generated when compiling for sm_70.
2DConvolution.cu:65:28: warning: implicit conversion from 'int' to 'float' changes value from 2147483647 to 2147483648 [-Wimplicit-const-int-float-conversion]
                        A[i][j] = (float)rand()/RAND_MAX;
                                               ~^~~~~~~~
/usr/include/stdlib.h:128:18: note: expanded from macro 'RAND_MAX'
#define RAND_MAX        2147483647
                        ^~~~~~~~~~
2DConvolution.cu:138:2: warning: 'cudaThreadSynchronize' is deprecated [-Wdeprecated-declarations]
        cudaThreadSynchronize();
        ^
/usr/local/cuda/targets/x86_64-linux/include/cuda_runtime_api.h:957:8: note: 'cudaThreadSynchronize' has been explicitly marked deprecated here
extern __CUDA_DEPRECATED __host__ cudaError_t CUDARTAPI cudaThreadSynchronize(void);
       ^
/usr/local/cuda/targets/x86_64-linux/include/cuda_runtime_api.h:238:42: note: expanded from macro '__CUDA_DEPRECATED'
#define __CUDA_DEPRECATED __attribute__((deprecated))
                                         ^
CUDA Flux: instrumenting host code...
CUDA Flux: CUDA Version 10.1
CUDA Flux: Found BasicBlockCount for kernel _Z20convolution2D_kerneliiPfS_: 4
2 warnings generated when compiling for host.

[root@907effa70e67 2DCONV]# ls
2DConvolution     2DConvolution.cu_fc8b3d24.bc   2DConvolution.cu_fc8b3d24.ptx  Makefile
2DConvolution.cu  2DConvolution.cu_fc8b3d24.out  2DConvolution.cuh

[root@907effa70e67 2DCONV]# ./2DConvolution
setting device 0 with name Tesla V100-PCIE-32GB
GPU Time in seconds:
0.001853
CPU Time in seconds:
0.078976
Non-Matching CPU-GPU Outputs Beyond Error Threshold of 0.05 Percent: 0

[root@907effa70e67 2DCONV]# ls
2DConvolution     2DConvolution.cu_fc8b3d24.bc   2DConvolution.cu_fc8b3d24.ptx  bbc.txt   PTX_Analysis.yml
2DConvolution.cu  2DConvolution.cu_fc8b3d24.out  2DConvolution.cuh              Makefile

yehiaArafa · 2021-12-14T02:53:34Z

Looks like the llvm tool has not been installed correctly. Because there should be more files other than the ones you are showing here.

We are working on providing a docker file that has everything installed already. We do that very soon

Corle-hyz · 2021-12-15T05:27:28Z

Got it. I guess it's due to my llvm tool, too. Thank you very much again. I'm very interested in your work, looking forward to your docker file! :-)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"ST.E.SYS" is not available in SASS instructions table #1

"ST.E.SYS" is not available in SASS instructions table #1

Corle-hyz commented Dec 6, 2021

Corle-hyz commented Dec 6, 2021

yehiaArafa commented Dec 6, 2021

Corle-hyz commented Dec 7, 2021

yehiaArafa commented Dec 14, 2021

Corle-hyz commented Dec 15, 2021

"ST.E.SYS" is not available in SASS instructions table #1

"ST.E.SYS" is not available in SASS instructions table #1

Comments

Corle-hyz commented Dec 6, 2021

Corle-hyz commented Dec 6, 2021

yehiaArafa commented Dec 6, 2021

Corle-hyz commented Dec 7, 2021

yehiaArafa commented Dec 14, 2021

Corle-hyz commented Dec 15, 2021