Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assorted fixes and OpenCL enhancements #153

Open
wants to merge 5 commits into
base: dev
Choose a base branch
from
Open

Conversation

RSpliet
Copy link
Contributor

@RSpliet RSpliet commented Oct 18, 2019

A second instalment of patches that I deem useful for OpenCL support in GPGPUSim.

Patch 1 fixes a problem for both Cuda and OpenCL simulation under the following conditions:

  • Integer and floating point instructions are executed by the same unit (e.g. SM2_GTX480)
  • The number of pipeline stages for integer operations exceeds that of floating point ops. (e.g. the shfl instructions)

The deprecated GeForce GTX750Ti simulation model is an example that fits the bill. Technically the GTX480 model would too, but IIRC the shfl instructions required for violation have been introduced with Kepler, hence the PTX compiler wouldn't issue such instructions for a Fermi-generation GPU in the first place, masking the issue.

Patch 2 is a simple fix addressing issue #138 in the simplest way imaginable to fix linking of the OpenCL module. There is a lot more work to be done to improve compilation flow, but with this fix there exists a way of running OpenCL kernels. More on that in issue #154 .

Patch 3 is taken from PR #21 and as noted is helpful for applications using the C++ API. It's also just a simple patch that extends the library in a strictly non-harmful way.

Patch 4 is mainly a pass to expose more information through clinfo. It's still far from perfect, but it no longer crashes.

Patch 5 adds minimal support for profiling within an OpenCL program. This is the preferred way for obtaining run-times of kernels in OpenCL, as it facilitates a generic API for hardware-specific time measurement methods (e.g. on-device timers). This is likely to be more accurate than software wall-time measurement mechanisms. I use these APIs in some benchmarks that I would like to release once I sorted out licensing (don't wait for these benchmarks, I currently have a PhD dissertation write-up to prioritise).

@RSpliet
Copy link
Contributor Author

RSpliet commented Nov 20, 2019

@aamodt @tgrogers Ping?

@RSpliet
Copy link
Contributor Author

RSpliet commented Jan 13, 2020

Another ping? These are pretty solid patches, if I may say so myself 😉

RSpliet and others added 5 commits April 11, 2020 00:22
…y SF.

Fixes a problem exposed (but not introduced) by:
  d212d7e: take account of shfl latency

Under the SM2_GTX480 configuration, a shfl instruction now results in an out-of-
bounds write into a non-allocated pipeline slot, causing "random" crashes.

While at it, tidy up a bit. There doesn't seem to be a strongly enforced coding
convention, but three tabs for one level of identation seems a tad excessive
for any style.

Signed-off-by: Roy Spliet <[email protected]>
The simplest fix to issue gpgpu-sim#138. Fixes linking error when building with OpenCL.
For now this variable is unused.

Signed-off-by: Roy Spliet <[email protected]>
Helps programs report their own simulated time.

Signed-off-by: Roy Spliet <[email protected]>
@RSpliet
Copy link
Contributor Author

RSpliet commented Apr 10, 2020

Rebased against current upstream dev branch.

@RSpliet
Copy link
Contributor Author

RSpliet commented May 27, 2020

@tgrogers after your pull request was merged, my patches no longer apply cleanly. I suspect the conflict resolutions proposed by github are insufficient and will include compilation- and run-time issues. I will look into them at a later point, possibly this weekend.
On a related note, I have recently pushed the OpenCL micro-benchmark suite that I used for my architectural research, which can be found here. Most of the kernels are from different sources (parboil, rodinia, kinectfusion), making this a combined work license-technically. My portions have been licensed under the MIT license as this seemed most compatible with the license of the other suites. They build with CMake, requiring only the presence of OpenCL libraries and headers.
I believe this work can be useful for CI testing of GPGPU's OpenCL support for two reasons. Firstly, for 16 of the benchmarks I have included reference output values that can be used to validate correctness of the results. Secondly, rather than hard-coding parameters, the programs all has a simple unified interface that permits selecting the OpenCL platform (-P), device (-d), number of repetitions each kernel must be executed (-I) and a switch to enable output validation (-c).
If you agree and are interested in using CLaxon for CI, please let me know if there are any specific features or modifications that might be of help. Meanwhile, once I've forward-ported the OpenCL contributions in this pull request, I will make sure that at least some of these benchmarks run free of problems. Hopefully this could be a first step forward into making OpenCL a first-class citizen again on GPGPU-Sim!

@aamodt
Copy link
Collaborator

aamodt commented May 28, 2020

Thanks for the contribution @RSpliet ! We are expecting to push another set of significant changes sometime in the next few days (corresponding to our Accel-Sim / GPGPU-Sim 4.0 paper appearing at ISCA 2020). Once our changes appear if you can then update your pull request so there are no conflicts and they pass the checks I'll be happy to take a detailed look and assuming there are no major problems go ahead and merge your changes.

@aamodt
Copy link
Collaborator

aamodt commented Jun 11, 2020

Just an update. We don't expect to merge the rest of the changes until end of June, so if you want to update your pull request now we can merge it in the meantime.

@RSpliet
Copy link
Contributor Author

RSpliet commented Jun 16, 2020

@aamodt Thanks for the heads-up. Unfortunately I will not be able to find time before that merge, so please go ahead and merge your big developments when they're ready and I'll dive back into this at a later stage.

@aamodt
Copy link
Collaborator

aamodt commented Jul 14, 2020

The second batch of changes have been added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants