-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sleep before ReadBuffer to resolve Nvidia's busywait issue #60
Conversation
159c3e2
to
f29b4e0
Compare
f29b4e0
to
162cdef
Compare
Tested and approved, compatible with eXternal Optimizations. |
9bd836b
to
6cdf385
Compare
Oh that's a good, working hack. Thanks! I might prefer your idea over krnlx's suggestion from #54 |
Just merged my branch back up w/ upstream and also tested this patch, and it seems to work great - freed up a core from 100% to now using about 7% (on an i7-4790k). 970 is mining at about 45 sol/s (unchanged), XC AVX2 -t6 is up from 24sol/s to 26+ sol/s. Tested also on my 470 dev system, performance is unchanged (~77 sol/s w/ a memstrap 1500 system), CPU < 2%. |
The timing here is not perfect, with two timers, one measuring kernel time and one measuring the time spent in the read it should be possible to reduce the load even further. The increased CPU load is because the the sleep time will be always 2% less than time required for the kernel to complete. |
Awesome! My weak GeForce GT 740M is making a bit more sols than before. It sure helps for the weak gpu's. |
@mbevand something worth considering is changing the mining pipeline a bit.
In this patch I do:
Something worth considering:
This is a bit more complex but has two benefits, 1. there is even less downtime (it might turn out that with this running with one instance will be better), 2. the timing adjustment is even better. |
Cool. Apparently zawawa was even able to include and compile your patch on Cygwin/Windows: https://bitcointalk.org/index.php?topic=1666489.msg16854647#msg16854647 I asked him to confirm that clock_gettime(CLOCK_MONOTONIC...) works on Cygwin/Windows. |
It doesn't he replaced it by something else, also Cygwin doesn't have |
I guess we don't have to worry about Windows right now. |
Give me 30min before merging it, I might try something else that will have lower load on CPU. |
Take your time. I will probably need 1-2 days before I merge it. |
Sure. |
I did not observe any negative influence on performance with this patch. The CPU core usage is reduced to about 8% from 100%. I am getting consistent 38Sols/s on MSI Gaming X 1070 on one instance while testing. Sols/s on AMD cards (RX480) are not affected. The clFlush it required as AMD will only start working when there is blocking operation waiting which is delayed in this case. This is only suggestion and solution that works for me, it isn't perfect but it works for me. With this silentarmy is the best miner for Nvidia that I know. Don't feel obligated to accept this PR, you can use parts of it or it in whole to engineer a solution that you think is the best. I have tried to sched_yield trick, it seems to stopped working unfortunately.
6601f0b
to
acb7e15
Compare
acb7e15
to
c00bab0
Compare
I reworked it. Getting the same hash rates as without the patch, the CPU usage is down to 4% from 100%. Parameters are tweakable. |
Awesome! Have you made sure it doesn't negatively impact AMD GPUs? Should we leave the code as-is or turn it on only when running against an Nvidia GPU? zawawa replied me saying clock_gettime() does seem to work on Cygwin/Windows. |
I have run tests on AMD too, I couldn't see any difference, the whole thing can be disabled by setting |
MSI RX480 X 8GB at test run 1000 nonces: So the overhead is 0.6Sol/s or (0.3 with a tweak), it will be less on slower GPUs (as invocation time is longer). |
It might also have no influence at all with two instances running. |
38648a3
to
d41eca4
Compare
wait wait, I tested the last solution, it seems a little performance decrease. Celeron 1840, 6x1070, I think I will stay at Kubuxu v1 solution, it is fastest. |
@krnlx try playing with SLEEP_SKIP_RATIO it previously was at 0.02 |
Thanks so much, I merged a cleaned-up version of your fix: Please not that you had 2 clFlush() and only 1 is needed. Verify on Nvidia hardware I didn't screw things up :) |
I had two flushes as I wanted the GPU to start work ASAP (applies to AMD) but it didn't seem to have any difference IIRC. |
That makes sense. You had a flush before enqueuing kernel_sols and the On Tue, Nov 15, 2016 at 11:08 AM, Jakub Sztandera [email protected]
|
Also confirming on Nvidia, CPU usage is ~3%. |
Final comparison: Tested with |
I did not observe any negative influence on performance with this patch.
The CPU core usage is reduced to about 8% from 100%.
I am getting consistent 38Sols/s on MSI Gaming X 1070 on one instance
while testing. Sols/s on AMD cards (RX480) are not affected.
The clFlush it required as AMD will only start working when there is
blocking operation waiting which is delayed in this case or the queue is flushed.
This is only suggestion and solution that works for me, it isn't perfect. With this silentarmy is the best miner for Nvidia that I know.
Don't feel obligated to accept this PR, you can use parts of it or it in
whole to engineer a solution that you think is the best.
I have tried to sched_yield trick, it seems to stopped working
unfortunately.
Resolves #54