Assorted fixes and OpenCL enhancements #153

RSpliet · 2019-10-18T11:53:20Z

A second instalment of patches that I deem useful for OpenCL support in GPGPUSim.

Patch 1 fixes a problem for both Cuda and OpenCL simulation under the following conditions:

Integer and floating point instructions are executed by the same unit (e.g. SM2_GTX480)
The number of pipeline stages for integer operations exceeds that of floating point ops. (e.g. the shfl instructions)

The deprecated GeForce GTX750Ti simulation model is an example that fits the bill. Technically the GTX480 model would too, but IIRC the shfl instructions required for violation have been introduced with Kepler, hence the PTX compiler wouldn't issue such instructions for a Fermi-generation GPU in the first place, masking the issue.

Patch 2 is a simple fix addressing issue #138 in the simplest way imaginable to fix linking of the OpenCL module. There is a lot more work to be done to improve compilation flow, but with this fix there exists a way of running OpenCL kernels. More on that in issue #154 .

Patch 3 is taken from PR #21 and as noted is helpful for applications using the C++ API. It's also just a simple patch that extends the library in a strictly non-harmful way.

Patch 4 is mainly a pass to expose more information through clinfo. It's still far from perfect, but it no longer crashes.

Patch 5 adds minimal support for profiling within an OpenCL program. This is the preferred way for obtaining run-times of kernels in OpenCL, as it facilitates a generic API for hardware-specific time measurement methods (e.g. on-device timers). This is likely to be more accurate than software wall-time measurement mechanisms. I use these APIs in some benchmarks that I would like to release once I sorted out licensing (don't wait for these benchmarks, I currently have a PhD dissertation write-up to prioritise).

RSpliet · 2019-11-20T14:29:17Z

@aamodt @tgrogers Ping?

RSpliet · 2020-01-13T15:41:00Z

Another ping? These are pretty solid patches, if I may say so myself 😉

…y SF. Fixes a problem exposed (but not introduced) by: d212d7e: take account of shfl latency Under the SM2_GTX480 configuration, a shfl instruction now results in an out-of- bounds write into a non-allocated pipeline slot, causing "random" crashes. While at it, tidy up a bit. There doesn't seem to be a strongly enforced coding convention, but three tabs for one level of identation seems a tad excessive for any style. Signed-off-by: Roy Spliet <[email protected]>

The simplest fix to issue gpgpu-sim#138. Fixes linking error when building with OpenCL. For now this variable is unused. Signed-off-by: Roy Spliet <[email protected]>

Unbreaks clinfo largely. Signed-off-by: Roy Spliet <[email protected]>

Helps programs report their own simulated time. Signed-off-by: Roy Spliet <[email protected]>

RSpliet · 2020-04-10T23:29:08Z

Rebased against current upstream dev branch.

RSpliet · 2020-05-27T14:44:25Z

@tgrogers after your pull request was merged, my patches no longer apply cleanly. I suspect the conflict resolutions proposed by github are insufficient and will include compilation- and run-time issues. I will look into them at a later point, possibly this weekend.
On a related note, I have recently pushed the OpenCL micro-benchmark suite that I used for my architectural research, which can be found here. Most of the kernels are from different sources (parboil, rodinia, kinectfusion), making this a combined work license-technically. My portions have been licensed under the MIT license as this seemed most compatible with the license of the other suites. They build with CMake, requiring only the presence of OpenCL libraries and headers.
I believe this work can be useful for CI testing of GPGPU's OpenCL support for two reasons. Firstly, for 16 of the benchmarks I have included reference output values that can be used to validate correctness of the results. Secondly, rather than hard-coding parameters, the programs all has a simple unified interface that permits selecting the OpenCL platform (-P), device (-d), number of repetitions each kernel must be executed (-I) and a switch to enable output validation (-c).
If you agree and are interested in using CLaxon for CI, please let me know if there are any specific features or modifications that might be of help. Meanwhile, once I've forward-ported the OpenCL contributions in this pull request, I will make sure that at least some of these benchmarks run free of problems. Hopefully this could be a first step forward into making OpenCL a first-class citizen again on GPGPU-Sim!

aamodt · 2020-05-28T02:49:48Z

Thanks for the contribution @RSpliet ! We are expecting to push another set of significant changes sometime in the next few days (corresponding to our Accel-Sim / GPGPU-Sim 4.0 paper appearing at ISCA 2020). Once our changes appear if you can then update your pull request so there are no conflicts and they pass the checks I'll be happy to take a detailed look and assuming there are no major problems go ahead and merge your changes.

aamodt · 2020-06-11T02:03:09Z

Just an update. We don't expect to merge the rest of the changes until end of June, so if you want to update your pull request now we can merge it in the meantime.

RSpliet · 2020-06-16T11:27:20Z

@aamodt Thanks for the heads-up. Unfortunately I will not be able to find time before that merge, so please go ahead and merge your big developments when they're ready and I'll dive back into this at a later stage.

aamodt · 2020-07-14T04:17:16Z

The second batch of changes have been added.

This was referenced Oct 18, 2019

Problems running OpenCL kernels on GPGPUSim #154

Open

Continuous integration test times out/broken. #152

Open

RSpliet and others added 5 commits April 11, 2020 00:22

libopencl: Define no_of_ptx

ac7bc41

The simplest fix to issue gpgpu-sim#138. Fixes linking error when building with OpenCL. For now this variable is unused. Signed-off-by: Roy Spliet <[email protected]>

Add some OpenCL functions so that the C++ API can be used.

e8dc59c

libopencl: Amend clGetDeviceInfo, make missing information non-fatal.

a7f2d31

Unbreaks clinfo largely. Signed-off-by: Roy Spliet <[email protected]>

libopencl: Implement just enough event for getEventProfilingInfo()

88fa886

Helps programs report their own simulated time. Signed-off-by: Roy Spliet <[email protected]>

RSpliet force-pushed the dev branch from 4fcff17 to 88fa886 Compare April 10, 2020 23:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assorted fixes and OpenCL enhancements #153

Assorted fixes and OpenCL enhancements #153

RSpliet commented Oct 18, 2019 •

edited

Loading

RSpliet commented Nov 20, 2019

RSpliet commented Jan 13, 2020

RSpliet commented Apr 10, 2020

RSpliet commented May 27, 2020 •

edited

Loading

aamodt commented May 28, 2020 •

edited

Loading

aamodt commented Jun 11, 2020

RSpliet commented Jun 16, 2020

aamodt commented Jul 14, 2020

Assorted fixes and OpenCL enhancements #153

Are you sure you want to change the base?

Assorted fixes and OpenCL enhancements #153

Conversation

RSpliet commented Oct 18, 2019 • edited Loading

RSpliet commented Nov 20, 2019

RSpliet commented Jan 13, 2020

RSpliet commented Apr 10, 2020

RSpliet commented May 27, 2020 • edited Loading

aamodt commented May 28, 2020 • edited Loading

aamodt commented Jun 11, 2020

RSpliet commented Jun 16, 2020

aamodt commented Jul 14, 2020

RSpliet commented Oct 18, 2019 •

edited

Loading

RSpliet commented May 27, 2020 •

edited

Loading

aamodt commented May 28, 2020 •

edited

Loading