-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up core library loading, bring compilation time of small files down to <10ms mark #5631
Comments
Thanks for raising this issue. Is it possible to try the prebuilt releases here to see if you can repro the issue? On my machine it takes .2 seconds for slangc --help. One difference with the prebuilt binaries is that the core module is embedded into the libslang.so itself so there wont be any filesystem calls. |
I'm using the prebuilt binaries. For me slangc takes 1.33s, whereas glslang similarly takes 0.00s. My OS is Arch Linux. edit Prebuilt binaries are slow, but I compiled it myself and the speed becomes .08 seconds, so not sure what is up with the pre-built binaries. All I did was cmake .. and make after to build, didn't change any settings. |
Prebuilt is indeed faster. So, I found the issue, it seems slangc tries to write I could reproduce the issue like this:
But still, even 100ms is a long time for a shader compiler and it should be improved, especially when you consider that there is no compilation happening. |
This would explain it. When I used the prebuilt, I used a folder that I do not have write permissions to, but when I compiled myself, I observed a slow first run but the next runs ran fine. |
I probably should document the behavior about the slang-core-modulr.bin |
When building slang for use from read only locations or for any production use, we recommend turning on the SLANG_EMBED_CORE_MODULE setting in cmake. This will precompile the core module and embed the serialized binary of it into the slang library to avoid building core module during first run and writing it out to a file. The 100ms is the time needed to deserialize the core module. Slang has a much more rich and complex core library compared to glsl and it takes longer to deserialize into memory. Further improving this performance is possible and necessary in the short future since the core module is only going to get bigger. But this requires a deep infrastructural change to support on demand lazy deserialization, so we probably won’t be able to get to it in the very short term. In practice, most engines would want to integrate their shader building workflow in a custom multithreaded build system that calls slang API directly, in which case this initialization cost is amortized over many compiles so it is not a huge issue in those cases. |
Does it make sense to enable it in cmake by default? As Contrary to what was said prebuilt releases don't have that option enabled, may be enabled for wasm only I guess. I will open a PR if that is ok.
It would be nice to remove deserialization step entirely, either by parsing and compiling stdlib on the fly, or by compiling in the standard library into the compiler binary without any runtime parsing. Not sure how doable is that. First approach may need lazy/incremental compilation too. For second approach stdlib can just be implemented in the C++ code or something. For first approach also suggest looking at Naga as it is one of the fastest shader compilers out there.
Oh, that's good to hear. With that initial slowdown doesn't matter much indeed. |
The binary blob we encode into the slang library is already the parsed and checked AST and pregenerated IR of the core module, so it is already post parsing. But still it is quite large and takes time to be fully deserialized. Compiling from source will take much longer — almost 2s like you are experiencing on the first runs. And hard coding the generation of AST nodes and IR of core module declarations won’t be much faster than just deserialize the binary blob either. I think the right way to make things faster is to make the serialied binary random accessible so we only deserialize an AST or IR node when it is actually used by the shader we are compiling. |
If the prebuilt binary is not embedding the core module, that is a regression and we need to fix. It is likely introduced when we refactor our cmake scripts. |
Description:
Core module takes a long time to load, which in turn slows down all operations with CLI, improve that.
Compilation of
hello.slang
withslangc
should be around 1ms mark.This will require a number of changes in the codebase.
Old description:
Simple call takes almost two seconds! Even chromium starts faster on my machine.
For comparison glslang:
ltrace tells that it is stuck during
slang_createGlobalSession
strace shows that it tries to newfstatat
libslang.so
then tries to openatslang-core-module.bin
and then it is a long long chain ofbrk
calls and it is stuck during brk calls it seems.Looks like it tries to allocate a loooooot of memory during startup? using brk????
To open
libslang.so
??Shouldn't happen.
slang rev: dbc28b4
os: NixOS 24.11
custom build of slang, if that is related.
The text was updated successfully, but these errors were encountered: