Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ST.E.SYS" is not available in SASS instructions table #1

Open
Corle-hyz opened this issue Dec 6, 2021 · 5 comments
Open

"ST.E.SYS" is not available in SASS instructions table #1

Corle-hyz opened this issue Dec 6, 2021 · 5 comments

Comments

@Corle-hyz
Copy link

In order to repeat the result of the paper(SC'21), I make following operations at polybenchGpu/CUDA/2DCONV:

  1. module load cuda_flux.
    Now the cuda, llvm/11.0 and cuda_flux have been loaded in module.

  2. clang_cf++ --cuda-gpu-arch=sm_70 -std=c++11 -lcudart 2DConvolution.cu -o 2DConvolution.
    There are some warnings such as implicit conversion from 'int' to 'float', but the compilation also completes. After compilation, there are 4 new files, they are 2DConvolution, 2DConvolution.cu_fc8b3d24.bc, 2DConvolution.cu_fc8b3d24.out, and 2DConvolution.cu_fc8b3d24.ptx.
    There is the first question: I have no idea how to generate the ptx_trace/. We have the single *.ptx, but ptx_trace/ was not found, and I didn't find any direction at README.

  3. LD_PRELOAD=/home/PPT-GPU/tracing_tool/tracer.so ./2DConvolution
    After this we can get app_config.py, PTX_Analysis.yml, bbc.txt, memory_traces/ and sass_traces/. This step is much slower than the result in paper, I use more than 8 minites to gain the sass traces.

  4. python3 /home/PPT-GPU/ppt.py --app /home/polybench/polybenchGpu-master/CUDA/2DCONV/ --sass --config TITANV --granularity 2
    There is the second quetion: when I excute this command, I receive an error "ST.E.SYS" is not available in SASS instructions table and then the program exits.
    I look up the PPT-GPU source code and find that there is no process code for ST.E.SYS, I don't know how to figure it out. Actually, when I profile my own applition, the sass_trace also contains ST.E.SYS.

The hardware I used is Tesla V100.

@Corle-hyz
Copy link
Author

These are the screenshots of problem:

}HQ4ST_IYHEOPYX}1JMS751

)SU{NWP)%7IW`KQ0N8{6R

@yehiaArafa
Copy link
Member

Hi,

(2) If you are trying to get the PTX traces, please compile with -03 and run the application. You should have the ptx_traces folder after the application finishes running. It looks like you have only compiled without actually running it.
ex:
clang_cf++ --cuda-gpu-arch=sm_70 -std=c++11 -L/software/GPU/Cuda/10.2/lib64 -03 -lcudart 2DConvolution.cu -o 2DConvolution

If you are want to work on the SASS, you can skip the LLVM step.

(3) The memory traces extraction takes some time, depending on the size of the problem. We haven't reported the time taken by each application for trace extraction. The time noted in the paper is for the predictions.

(4) ST.E.SYS is a generic memory. I think convolution 2d from Polybench shouldn't have a generic memory operation. How are you building the application?

@Corle-hyz
Copy link
Author

Thanks for your reply.

(2) I redo the step (2) above by adding -O3 and -L/usr/local/cuda/lib64 options, then use ./2DConvolution to run the compiled application, but there's still no ptx_traces/ folder.

(3) Solved.
It seems that I misunderstand the meaning of time in the paper, sorry for bothering.

(4) Solved.
By adding -O3 and -L/usr/local/cuda/lib64 options to re-compile, ST.E.SYS disappears from the sass traces, and I can sucessfully run the PPT-GPU to predict now. Thanks very much!

The text below records the situation when I redo step (2):

root@907effa70e67 2DCONV]# ls
2DConvolution.cu  2DConvolution.cuh  Makefile

[root@907effa70e67 2DCONV]# clang_cf++ --cuda-gpu-arch=sm_70 -std=c++11 -L/usr/local/cuda/lib64 -O3 -lcudart 2DConvolution.cu -o 2DConvolution
+ clang++ -Xclang -load -Xclang /opt/cuda-flux/lib/libcuda_flux_pass.so -finline-functions --cuda-gpu-arch=sm_70 -std=c++11 -L/usr/local/cuda/lib64 -O3 -lcudart 2DConvolution.cu -o 2DConvolution
2DConvolution.cu:65:28: warning: implicit conversion from 'int' to 'float' changes value from 2147483647 to 2147483648 [-Wimplicit-const-int-float-conversion]
                        A[i][j] = (float)rand()/RAND_MAX;
                                               ~^~~~~~~~
/usr/include/stdlib.h:128:18: note: expanded from macro 'RAND_MAX'
#define RAND_MAX        2147483647
                        ^~~~~~~~~~
2DConvolution.cu:138:2: warning: 'cudaThreadSynchronize' is deprecated [-Wdeprecated-declarations]
        cudaThreadSynchronize();
        ^
/usr/local/cuda/targets/x86_64-linux/include/cuda_runtime_api.h:957:8: note: 'cudaThreadSynchronize' has been explicitly marked deprecated here
extern __CUDA_DEPRECATED __host__ cudaError_t CUDARTAPI cudaThreadSynchronize(void);
       ^
/usr/local/cuda/targets/x86_64-linux/include/cuda_runtime_api.h:238:42: note: expanded from macro '__CUDA_DEPRECATED'
#define __CUDA_DEPRECATED __attribute__((deprecated))
                                         ^
CUDA Flux: Instrumenting device code...
CUDA Flux: Module prefix: 2DConvolution.cu_fc8b3d24
CUDA Flux: Working on kernel: _Z20convolution2D_kerneliiPfS_
CUDA Flux: BlockCount: 4
2 warnings generated when compiling for sm_70.
2DConvolution.cu:65:28: warning: implicit conversion from 'int' to 'float' changes value from 2147483647 to 2147483648 [-Wimplicit-const-int-float-conversion]
                        A[i][j] = (float)rand()/RAND_MAX;
                                               ~^~~~~~~~
/usr/include/stdlib.h:128:18: note: expanded from macro 'RAND_MAX'
#define RAND_MAX        2147483647
                        ^~~~~~~~~~
2DConvolution.cu:138:2: warning: 'cudaThreadSynchronize' is deprecated [-Wdeprecated-declarations]
        cudaThreadSynchronize();
        ^
/usr/local/cuda/targets/x86_64-linux/include/cuda_runtime_api.h:957:8: note: 'cudaThreadSynchronize' has been explicitly marked deprecated here
extern __CUDA_DEPRECATED __host__ cudaError_t CUDARTAPI cudaThreadSynchronize(void);
       ^
/usr/local/cuda/targets/x86_64-linux/include/cuda_runtime_api.h:238:42: note: expanded from macro '__CUDA_DEPRECATED'
#define __CUDA_DEPRECATED __attribute__((deprecated))
                                         ^
CUDA Flux: instrumenting host code...
CUDA Flux: CUDA Version 10.1
CUDA Flux: Found BasicBlockCount for kernel _Z20convolution2D_kerneliiPfS_: 4
2 warnings generated when compiling for host.

[root@907effa70e67 2DCONV]# ls
2DConvolution     2DConvolution.cu_fc8b3d24.bc   2DConvolution.cu_fc8b3d24.ptx  Makefile
2DConvolution.cu  2DConvolution.cu_fc8b3d24.out  2DConvolution.cuh

[root@907effa70e67 2DCONV]# ./2DConvolution
setting device 0 with name Tesla V100-PCIE-32GB
GPU Time in seconds:
0.001853
CPU Time in seconds:
0.078976
Non-Matching CPU-GPU Outputs Beyond Error Threshold of 0.05 Percent: 0

[root@907effa70e67 2DCONV]# ls
2DConvolution     2DConvolution.cu_fc8b3d24.bc   2DConvolution.cu_fc8b3d24.ptx  bbc.txt   PTX_Analysis.yml
2DConvolution.cu  2DConvolution.cu_fc8b3d24.out  2DConvolution.cuh              Makefile

@yehiaArafa
Copy link
Member

Looks like the llvm tool has not been installed correctly. Because there should be more files other than the ones you are showing here.

We are working on providing a docker file that has everything installed already. We do that very soon

@Corle-hyz
Copy link
Author

Got it. I guess it's due to my llvm tool, too. Thank you very much again. I'm very interested in your work, looking forward to your docker file! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants