Skip to content

Conversation

@expikr
Copy link
Contributor

@expikr expikr commented Aug 27, 2025

This has a much more stable timing behaviour from my testing in statically linked usercode.

@expikr expikr marked this pull request as draft August 27, 2025 21:03
@expikr expikr force-pushed the patch-3 branch 2 times, most recently from 092c1ef to befab1c Compare August 28, 2025 05:17
@sezero
Copy link
Contributor

sezero commented Aug 28, 2025

How really good an idea is using an undocumented NT call and linking to ntdll.dll for it?

@slouken
Copy link
Collaborator

slouken commented Aug 28, 2025

How really good an idea is using an undocumented NT call and linking to ntdll.dll for it?

I'm not sure.

@expikr, can you provide details on the testing that you did and the results that you got that convinced you that this is an improvement?

I'm not opposed to this being an option, but it should not entirely replace the existing code. Can you please adapt your change to switch between this approach and the original approach via #ifdef, and dynamically load NtDelayExecution() so we don't have to introduce a dependency on ntdll.lib?

If we decide we want to go this way, we can remove the old code and switch to static linking in a separate step.

@slouken slouken added this to the 3.6.0 milestone Aug 28, 2025
@expikr
Copy link
Contributor Author

expikr commented Aug 28, 2025

TBH I didn't really test very rigorously to positively benchmark an improvement, mainly more just reading that it provides 100ns precision and has been a stable API since the very beginning of windows as it's the underlying syscall used by all the sleep and wait related win32 APIs, so just briefly plopped it into a test app and look at the FPS counter to confirm my bias.

I've made the requested changes. I'll write a more proper benchmark using QueryPerformanceCounter and report back.

@expikr expikr marked this pull request as ready for review August 28, 2025 19:14
@e4m2
Copy link
Contributor

e4m2 commented Aug 31, 2025

Might also be worth testing against RtlDelayExecution(). It uses a small userspace spin loop before delegating to NtDelayExecution(). Sleep() also calls it under the hood since (at least?) Windows 11.

EDIT: Looked over this again and I was wrong. The spin loop is only used if the sleep duration is zero, i.e. a yield is being performed instead of an actual sleep. When sleeping, RtlDelayExecution() is exactly equivalent to NtDelayExecution().

@nfries88
Copy link

nfries88 commented Sep 3, 2025

NtDelayExecution takes a timeout with 100ns resolution but in practice its resolution is limited to 500us (same as the now-documented high resolution waitable timers), and even then only if using the undocumented NtSetTimerResolution instead of the documented timeBeginPeriod. NtSetTimerResolution also takes a 100ns argument but is limited by the range returned bt NtQueryTimerResolution, which in every install of Windows I've tested on the highest resolution allowed by the system is 500us. IIRC the implementation in Wine limits this to 1ms even though it could hypothetically allow down to whatever the timer coalescing frequency is on the system it's running on, so there's not even the hypothetical advantage there.

I have used these before in combination with other undocumented functions to get delays more useful for timers combined with waiting on overlapped I/O completions, but don't see its utility for simple sleep situations especially now that there's the high resolution waitable timers.

@expikr expikr marked this pull request as draft October 24, 2025 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants