Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble compiling on Pi 4 #743

Open
krakenrf opened this issue Nov 20, 2024 · 20 comments
Open

Trouble compiling on Pi 4 #743

krakenrf opened this issue Nov 20, 2024 · 20 comments

Comments

@krakenrf
Copy link

Hi, the docs say this can be built on a Raspberry Pi, but I've been trying to compile this on a Pi 4 running Ubuntu 24.10 for the last few days without luck. Running on a 64GB SD card which I hope is big enough.

I've managed to get a little further each time with some fixes:

  1. First I couldn't run fetch_sources as Git simply did not want to download llvm as I think it's too big. Using --shallow helped.

  2. I kept getting an error cc: error: unrecognized command-line option during make. It seems that the compiler it was trying to use was gcc by default. I forced it to use clang by adding -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ to cmake.

  3. I have the Pi 4 2GB and 8GB models. At first I tried the 2GB model, and the compilation just kept crashing and causing the entire board to restart. Switching to the 8GB model helped me compile to a higher percentage, but it still crashed either with a reset, or just the terminal window closing. I then increased the swap space to 16GB, and again that got me further, but I still end up with a crash, or just with the terminal window closing itself without completion.

  4. I tried cmake ../ -DCLVK_BUILD_TESTS=OFF -DLLVM_INCLUDE_BENCHMARKS=OFF -DLLVM_INCLUDE_TESTS=OFF -DLLVM_ENABLE_BINDINGS=OFF -DLLVM_ENABLE_UNWIND_TABLES=OFF-DLLVM_BUILD_TOOLS=OFF -DCLSPV_BUILD_SPIRV_DIS=OFF -DCLSPV_BUILD_TESTS=OFF -DCLVK_BUILD_TESTS=OFF -DCLVK_BUILD_SPIRV_TOOLS=OFF -DCLVK_ENABLE_SPIRV_IL=OFF -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ to try and cut down on some of the things that need to compile, but at some point during an overnight compile the terminal window just closes itself, and the compilation never completes.

  5. Running make -j1 to force one core seems to help to get it a little further, but ultimately I still end up with a reset or terminal window close.

The culprit appears to be that it never gets past the llvm compilation. One guess I have is that the crashes are from running out of memory, but I've never seen the ram bar full during compilation with htop up, though the crashes always occur when I leave it running overnight so I may be missing it.

I am Interested to hear if anyone has anything else I should try, or if anyone else had it successfully installed on a Pi 4 recently?

@rjodinchr
Copy link
Contributor

You can try to use a prebuilt libclc to reduce the build load.

Build libclc on something else than a raspberry pi:
https://github.com/kpet/clvk/blob/main/.github/workflows/presubmit.yml#L112

Copy clspv--.bc and clspv--64.bc to your raspberry pi and then you will need to add the following to your cmake command:

-DCLSPV_EXTERNAL_LIBCLC_DIR=<place_where_clspv--.bc_is>

@krakenrf
Copy link
Author

Thanks, I just tried that. I couldn't figure out how to just compile clspv, so I just compiled the whole clvk project on a fast Linux computer, and found the clspv files, and copied them over to the Pi 4. Then I ran:

cmake ../ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCLSPV_EXTERNAL_LIBCLC_DIR=../libclc
But I'm getting stuck on this error now:

[  0%] Built target clspv_baked_opencl_header
make[2]: *** No rule to make target 'external/clspv/libclc/clspv--.bc', needed by 'external/clspv/include/clspv/clspv_builtin_library.h'.  Stop.
make[1]: *** [CMakeFiles/Makefile2:245790: external/clspv/cmake/CMakeFiles/clspv_builtin_library.dir/all] Error 2

@rjodinchr
Copy link
Contributor

It feels like clspv--.bc is just not in the external/clspv/libclc repository. Are you sure of what you have copied?

@krakenrf
Copy link
Author

krakenrf commented Nov 21, 2024

I had put them in ../libclc, but I also tried putting them in external/clspv/libclc, but with the same results.

I'm guessing it's not as simple as just copying the clspv--.bc over from something I built on another platform. Looking at the readme compilation of clspv is under cross-compilation, so I'm assuming I need to do something with cross-compilation?

Unfortunately those instructions on the readme are not clear to me. I gave it a go, and on the x64 machine I created a folder clang_host and ran the cmake command, but get the error The source directory "home/carl/clvk/external/clspv/third_party/llvm" does not appear to contain CMakeLists.txt"

@rjodinchr
Copy link
Contributor

Alright, then the issue is ../libclc. This is a relative path, you should avoid them as much as possible as you never know where they will end up being used.
Instead use the following: -DCLSPV_EXTERNAL_LIBCLC_DIR=$(realpath ../libclc)

@krakenrf
Copy link
Author

Nice! Looks like that worked well, I was finally able to complete the compilation and it's working. Thanks!

@krakenrf
Copy link
Author

Unfortunately while it compiled, and the software shows a selectable V3D option, it doesn't seem to actually work.

I ran ./simple_test and got:

Platform: clvk
Device: V3D 4.2.14
/home/dd/clvk/tests/simple/simple.cpp:76 error after CL call: -11

Interestingly, it also seems to have failed on my Ubuntu laptop, as I get the same error on it:

Platform: clvk
Device: Intel(R) UHD Graphics (TGL GT1)
/home/carl/clvk/tests/simple/simple.cpp:76 error after CL call:-11

I guess the clspv--.bc I compiled on the laptop must be not working?

@rjodinchr
Copy link
Contributor

I would not expect the libclc binaries to be the issue here, but it's not impossible.

Could you run it with the following environment variables set:

CLVK_LOG=4
CLVK_LOG_DEST=file:clvk.log

and upload clvk.log here?
That would help us analyse what is going wrong.

@krakenrf
Copy link
Author

Thanks, here is the output on my Ubuntu 22.04 laptop. Will get the Pi 4 one later.

clvk.log

@rjodinchr
Copy link
Contributor

hum, I see.
When we build the test, we create a clvk.conf in the build folder.
That file contains an entry preventing compiling.

you can either remove that file. Or run the binary from another repo.

@kpet
Copy link
Owner

kpet commented Nov 26, 2024

We should fix this. There's no reason for this broken config file ending up in the default location :).

rjodinchr added a commit to rjodinchr/clvk that referenced this issue Nov 26, 2024
Add the path to the config files in the sources instead.

Ref kpet#743
@krakenrf
Copy link
Author

krakenrf commented Nov 27, 2024

Thanks, that does fix the issue with the test not running. The test works fine now.

However, I'm still having issues getting a kernel to run from a program I'm trying to test clvk with. I'm not entirely sure if it's the kernel that is incompatible, or something wrong with my clvk install (or probably both), but the error I currently get makes it seem like my clvk install is not right.

First, this is my command to start the software (I tried both clspv64 and clspv - not sure which one I should use but they both give the same error):

CLVK_CLSPV_PATH=/home/dd/clvk/libclc/clspv64--.bc LD_LIBRARY_PATH=/home/dd/clvk/build/ satdump-ui

When running something that uses the GPU the error is:

(E) Error warping on GPU : Error building: /usr/bin/lli-15: lli: /home/dd/clvk/libclc/clspv64--.bc: error: Unknown attribute kind (86) (Producer: 'LLVM20.0.0git' Reader: 'LLVM 15.0.6')


The error made me think it's something to do with the system clang version so later I manually installed llvm20/clang20 and tried to recompile clvk on a Pi 5 (which BTW unlike the Pi 4 can complete the entire compilation when using the default clang with make -j1 and 8GB RAM, 16GB of swap). But the compilation with clang20 installed failed with the following error:

[ 93%] Building CXX object external/SPIRV-Tools/source/CMakeFiles/SPIRV-Tools-static.dir/val/validate_decorations.cpp.o
In file included from /home/dd/clvk/external/SPIRV-Tools/source/val/validate_decorations.cpp:15:
In file included from /usr/lib/gcc/aarch64-linux-gnu/12/../../../../include/c++/12/algorithm:61:
In file included from /usr/lib/gcc/aarch64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:61:
/usr/lib/gcc/aarch64-linux-gnu/12/../../../../include/c++/12/bits/stl_tempbuf.h:263:8: error: 'get_temporary_buffer<MemberOffsetPair>' is deprecated [-Werror,-Wdeprecated-declarations]
  263 |                 std::get_temporary_buffer<value_type>(_M_original_len));
      |                      ^
/usr/lib/gcc/aarch64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:4996:15: note: in instantiation of member function 'std::_Temporary_buffer<__gnu_cxx::__normal_iterator<MemberOffsetPair *, std::vector<MemberOffsetPair>>, MemberOffsetPair>::_Temporary_buffer' requested here
 4996 |       _TmpBuf __buf(__first, (__last - __first + 1) / 2);
      |               ^
/usr/lib/gcc/aarch64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:5070:23: note: in instantiation of function template specialization 'std::__stable_sort<__gnu_cxx::__normal_iterator<MemberOffsetPair *, std::vector<MemberOffsetPair>>, __gnu_cxx::__ops::_Iter_comp_iter<(lambda at /home/dd/clvk/external/SPIRV-Tools/source/val/validate_decorations.cpp:482:9)>>' requested here
 5070 |       _GLIBCXX_STD_A::__stable_sort(__first, __last,
      |                       ^
/home/dd/clvk/external/SPIRV-Tools/source/val/validate_decorations.cpp:480:10: note: in instantiation of function template specialization 'std::stable_sort<__gnu_cxx::__normal_iterator<MemberOffsetPair *, std::vector<MemberOffsetPair>>, (lambda at /home/dd/clvk/external/SPIRV-Tools/source/val/validate_decorations.cpp:482:9)>' requested here
  480 |     std::stable_sort(
      |          ^
/usr/lib/gcc/aarch64-linux-gnu/12/../../../../include/c++/12/bits/stl_tempbuf.h:99:5: note: 'get_temporary_buffer<MemberOffsetPair>' has been explicitly marked deprecated here
   99 |     _GLIBCXX17_DEPRECATED
      |     ^
/usr/lib/gcc/aarch64-linux-gnu/12/../../../../include/aarch64-linux-gnu/c++/12/bits/c++config.h:119:34: note: expanded from macro '_GLIBCXX17_DEPRECATED'
  119 | # define _GLIBCXX17_DEPRECATED [[__deprecated__]]
      |                                  ^
1 error generated.
make[2]: *** [external/SPIRV-Tools/source/CMakeFiles/SPIRV-Tools-static.dir/build.make:656: external/SPIRV-Tools/source/CMakeFiles/SPIRV-Tools-static.dir/val/validate_decorations.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:12847: external/SPIRV-Tools/source/CMakeFiles/SPIRV-Tools-static.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

rjodinchr added a commit to rjodinchr/clvk that referenced this issue Nov 27, 2024
Add the path to the config files in the sources instead.

Ref kpet#743
@rjodinchr
Copy link
Contributor

clspv--.bc is not the clspv compiler, it is the libclc used by the compiler. It is link static with clspv, so once clspv is compiled it is not needed anymore.

You should not need to define CLVK_CLSPV_PATH to use clvk, did you encounter an error trying to run without it?
You can try to compile clvk with -DCLVK_CLSPV_ONLINE_COMPILER=1 (in your cmake arguments). That will link clspv statically in clvk's OpenCL library (libOpenCL.so).

rjodinchr added a commit to rjodinchr/clvk that referenced this issue Nov 27, 2024
Add the path to the config files in the sources instead.

Ref kpet#743
rjodinchr added a commit to rjodinchr/clvk that referenced this issue Nov 27, 2024
Add the path to the config files in the sources instead.

Ref kpet#743
@krakenrf
Copy link
Author

krakenrf commented Nov 27, 2024

Ah okay that's my bad I must have misunderstood the readme.

When running without CLVK_CLSPV_PATH, I still end up with warnings and error building, but it might be kernel compatibility related:

(E) Error warping on GPU : Error building: clvk-fJ1Bn1/source.cl:34:13: warning: comparing floating point with == or != is unsafe
   34 |   if (shift == 0)
      |       ~~~~~ ^  ~
clvk-fJ1Bn1/source.cl:37:9: warning: mixing declarations and code is incompatible with standards before C99
   37 |   float x = cos(*lat * DEG_TO_RAD) * cos(*lon * DEG_TO_RAD);
      |         ^
clvk-fJ1Bn1/source.cl:59:9: warning: mixing declarations and code is incompatible with standards before C99
   59 |   float dist1 = SQ(xr[1] - pxy[0]) + SQ(yr[1] - pxy[1]);
      |         ^
clvk-fJ1Bn1/source.cl:79:23: warning: implicit conversion from '__private int' to 'float' may lose precision
   79 |   float x_diff = rx - x;
      |                     ~ ^
clvk-fJ1Bn1/source.cl:80:23: warning: implicit conversion from '__private int' to 'float' may lose precision
   80 |   float y_diff = ry - y;
      |                     ~ ^
clvk-fJ1Bn1/source.cl:82:29: warning: implicit conversion changes signedness: 'int' to 'size_t' (aka 'unsigned i

Previously I tried fixing these warnings as they were just simple C issues, but ultimately I still ended up with:

(E) Error warping on GPU : Error building:

And then no additional error message or warnings given.

I guess it's just that this kernel would need some work to be compatible with the limitations of clvk and clspv? https://github.com/SatDump/SatDump/blob/master/resources/opencl/warp_image_thin_plate_spline_fp32.cl

But I also tried stripping down the kernel to just an empty function to see if it could at least not give an error, and it still wouldn't compile without the same error. I admit I don't really know what I'm doing here with the Kernel though, or if the errors I'm getting are SatDump related, or clvk related.

kernel void warp_image_thin_plate_spline(
    global ushort *map_image,
    global ushort *img,
    global int *tps_no_points,
    global float *tps_x,
    global float *tps_y,
    global float *tps_coef_1,
    global float *tps_coef_2,
    global float *tps_xmean,
    global float *tps_ymean,
    global int *img_settings) {

    // Suppress warnings
    (void)map_image;
    (void)img;
    (void)tps_no_points;
    (void)tps_x;
    (void)tps_y;
    (void)tps_coef_1;
    (void)tps_coef_2;
    (void)tps_xmean;
    (void)tps_ymean;
    (void)img_settings;
}

Is there any way to just compile the kernel with clvk or clspv standalone, outside of the SatDump software? Then at least I can confirm if the kernel is compatible or not.

@rjodinchr
Copy link
Contributor

Please provide an updated clvk.log from a run with the application unmodified (#743 (comment))

@krakenrf
Copy link
Author

clvk.log

Ah looks like support is missing for Int16?

@krakenrf
Copy link
Author

Also I just want to add another observation. When I run SatDump with clvk and the same kernel on my Intel laptop, it compiles, runs, and finishes. But no final image is generated for some reason. The warped image is just not there.

If I run the warp on the Intel laptop without clvk, using the native OpenCL implementation, it works fine.

The same problem with the image not appearing happens on the Raspberry Pi 4/5 and Intel Laptop if I try to run it with llvmpipe.

I'll attach clvk.log for a llvmpipe run on the Pi 4 just in case it helps.

clvk.log

@rjodinchr
Copy link
Contributor

The reason for the failure you have right now is because the Vulkan driver that clvk tries to use does not support spv::CapabilityInt16.

But you have 2 Vulkan implementations on your platform:

[CLVK] Found 2 physical devices
[CLVK] linux_read_sorted_physical_devices:
[CLVK]      Original order:
[CLVK]            [0] llvmpipe (LLVM 15.0.6, 128 bits)
[CLVK]            [1] V3D 4.2.14

[0] llvmpipe is a software emulation, and it's the one being used by your application. On top of that it does not support Int16.

I think what you want is to use [1] V3D, which may support Int16.

@krakenrf
Copy link
Author

I'm fairly certain it is using V3D. In the SatDump software I can choose between V3D and llvmpipe as the OpenCL device to use.

Choosing V3D yields the first clvk.log I posted, and the GPU building error.

Choosing llvmpipe, yields the second clvk.log I posted. No error, but no image.

@rjodinchr
Copy link
Contributor

Alright, then your issue with V3D is that it does not support Int16. At the moment clspv is not capable of generating the code without it.

For the issue with llvmpipe, I don't see anything in the log.

kpet pushed a commit that referenced this issue Nov 30, 2024
Add the path to the config files in the sources instead.

Ref #743
kpet added a commit that referenced this issue Dec 1, 2024
* Add SPIRV Capabilities error in build logging

Ref #743

* Update src/program.cpp

Co-authored-by: Kévin Petit <[email protected]>

* Update src/program.cpp

---------

Co-authored-by: Kévin Petit <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants