-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Towards Beagle+OpenCL on a Raspberry Pi #157
Comments
Increasing GPU memoryI have activated
and it showed me that the
I don't know how the work size limitations, global work size, and local work size interact (as I said, I'm terribly new to all this low-level programming). Can we make BEAGLE work with “Max work item sizes 12x12x12; Max work group size 12”? |
@Anaphory -- great work getting You are welcome to provide a A word of caution, however -- almost every kernel is currently hard-coded for local-work-sizes that are "mod 16" since 16 is the magic memory coalescence size for most GPUs and is also super-convenient for 4 x 4 nucleotide models. I don't suspect, a priori, that you'll get much performance gain from the VideoCore. |
As I said, I don't really know anything about programming this close to hardware. I have no idea what work sizes and their limits actually mean, how BEAGLE works in general or what a GPU kernel needs to do. So I have absolutely no clue how to start working on such a pull request. I have just put my dumb question about modifying work sizes to the VC4CL developers, as you see. Maybe you can check over there to see whether I horribly misrepresented the issue? |
Hey, https://github.com/doe300/VC4CL maintainer here. As I already stated in doe300/VC4CL#101, work-group sizes of more than 12 are currently not possible for hardware/implementation limitation reasons. Also I have to agree that you should not have too high expectations in regards to performance on the Raspberry Pi GPU. From a short browse through the repository I could not find the OpenCL kernel sources. If someone could point me into that direction, I could try to see if there is something to be done to add support. |
This is less a BEAGLE issue, and more a post for your information. Maybe it can be tested, completed, and put somewhere useful.
I got OpenCL, BEAGLE and BEAST2 to work together on a Raspberry Pi in theory. The practical use still suffers from the fact that tests and Beast runs seems to go beyond the limits of the tiny GPU – maybe there's a way to mitigate that.
In this post, I may be forgetting steps that I took in my process of trial-and-error getting to the point I got to. I have a second Pi with a clean environment coming in a few days, I'll try to follow my own instructions and update them where necessary afterwards.
Operating Environment
I have a Raspberry Pi 3 model B V1.2, with an out-of-the-box Raspbian 10 (buster).
Shared libraries go into
/usr/local/lib
, so I work withthroughout.
OpenCL
Software:
There is a partial (Embedded Profile, targeting OpenCL version 1.2 as the last one that can be completely supported) OpenCL implementation for the Raspberry Pi. I compiled and installed it according to the instructions in a separate Git repository, but building VC4CL with settings
cmake -DBUILD_TESTING=ON -DBUILD_ICD=ON ..
, and thensudo make install TestVC4CL
went through without a hitch (I say that – at some point, closing the Chromium on the Pi made the difference between the compilation succeding and being killed for lack of memory.) I did not compile thehello_word.cl
test. I don't know whether the ICD is necessary, but at least it didn't hurt, so I activated it as instructed usingThe outcome is a properly installed OpenCL, it seems. (I may be running too many commands with super user rights in this post, but super user permissions are needed for running on the GPU, so all hope is lost anyway. The
sudo
here is definitely necessary to access/dev/mem
.)Beagle
First time I tried compiling BEAGLE, the tests complained about the missing SSE instruction set, so I configured BEAGLE as
and compiled it as instructed.
make check
fails withdue to
CL_INVALID_WORK_GROUP_SIZE from file <GPUInterfaceOpenCL.cpp>, line 584
. I put a debug output there to check the work group size the code wants to use, and it shows the global work size array has entries [256, 16, 1] –clinfo
said that the GPU has “Max work item sizes 12x12x12; Max work group size 12” so that's not surprising and I tried to continue.I ran
on a random BEAST2 XML I had lying around, not particularly chosen to be small or anything, and it failed with an
OpenCL error: CL_OUT_OF_RESOURCES from file <GPUInterfaceOpenCL.cpp>, line 787.
(Running it without-beagle_GPU
works, different from what I wrote in #156, and is a factor of 2.5 faster than BEAST without BEAGLE, so that's quite good already. It's still about a factor of 8 slower than the un-BEAGLEd BEAST on my generic laptop, but maybe it's useful for someone beyond teaching in the long run. After all, a Pi is less than one eighth the cost of a laptop…)The text was updated successfully, but these errors were encountered: