-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assorted fixes and OpenCL enhancements #153
base: dev
Are you sure you want to change the base?
Conversation
Another ping? These are pretty solid patches, if I may say so myself 😉 |
…y SF. Fixes a problem exposed (but not introduced) by: d212d7e: take account of shfl latency Under the SM2_GTX480 configuration, a shfl instruction now results in an out-of- bounds write into a non-allocated pipeline slot, causing "random" crashes. While at it, tidy up a bit. There doesn't seem to be a strongly enforced coding convention, but three tabs for one level of identation seems a tad excessive for any style. Signed-off-by: Roy Spliet <[email protected]>
The simplest fix to issue gpgpu-sim#138. Fixes linking error when building with OpenCL. For now this variable is unused. Signed-off-by: Roy Spliet <[email protected]>
Unbreaks clinfo largely. Signed-off-by: Roy Spliet <[email protected]>
Helps programs report their own simulated time. Signed-off-by: Roy Spliet <[email protected]>
Rebased against current upstream dev branch. |
@tgrogers after your pull request was merged, my patches no longer apply cleanly. I suspect the conflict resolutions proposed by github are insufficient and will include compilation- and run-time issues. I will look into them at a later point, possibly this weekend. |
Thanks for the contribution @RSpliet ! We are expecting to push another set of significant changes sometime in the next few days (corresponding to our Accel-Sim / GPGPU-Sim 4.0 paper appearing at ISCA 2020). Once our changes appear if you can then update your pull request so there are no conflicts and they pass the checks I'll be happy to take a detailed look and assuming there are no major problems go ahead and merge your changes. |
Just an update. We don't expect to merge the rest of the changes until end of June, so if you want to update your pull request now we can merge it in the meantime. |
@aamodt Thanks for the heads-up. Unfortunately I will not be able to find time before that merge, so please go ahead and merge your big developments when they're ready and I'll dive back into this at a later stage. |
The second batch of changes have been added. |
A second instalment of patches that I deem useful for OpenCL support in GPGPUSim.
Patch 1 fixes a problem for both Cuda and OpenCL simulation under the following conditions:
The deprecated GeForce GTX750Ti simulation model is an example that fits the bill. Technically the GTX480 model would too, but IIRC the shfl instructions required for violation have been introduced with Kepler, hence the PTX compiler wouldn't issue such instructions for a Fermi-generation GPU in the first place, masking the issue.
Patch 2 is a simple fix addressing issue #138 in the simplest way imaginable to fix linking of the OpenCL module. There is a lot more work to be done to improve compilation flow, but with this fix there exists a way of running OpenCL kernels. More on that in issue #154 .
Patch 3 is taken from PR #21 and as noted is helpful for applications using the C++ API. It's also just a simple patch that extends the library in a strictly non-harmful way.
Patch 4 is mainly a pass to expose more information through clinfo. It's still far from perfect, but it no longer crashes.
Patch 5 adds minimal support for profiling within an OpenCL program. This is the preferred way for obtaining run-times of kernels in OpenCL, as it facilitates a generic API for hardware-specific time measurement methods (e.g. on-device timers). This is likely to be more accurate than software wall-time measurement mechanisms. I use these APIs in some benchmarks that I would like to release once I sorted out licensing (don't wait for these benchmarks, I currently have a PhD dissertation write-up to prioritise).