-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scaling performance as sequences exceeds core count #115
Comments
I guess regardless how wasm code communicates with host, the host has to organize the threads/processes for each sequence. In AICI this is done with separate processes which means you can kill the process and limit execution time without overhead (if you put a time limit in wasmtime it seems to have significant overhead; I don't remember exactly but I recall 30% or so in the inner bias computation loop). The processes can also fork, though how needed that is for AICI functionality is another question. Threads would be potentially easier to synchronize (though one has to be careful with things like rayon - they seem to have issues when you have more than 50 cores or so). |
I did some cursory research into the libraries available for cross-process IPC in Rust, and, well, essentially all of them add some significant degree of complexity to the implementation or aren't as latency optimized (e.g.: using What considerations went into using I ask because I'm wondering if might make more sense to export a library interface to it to allow the host to decide how to handle concurrency, and e.g.: to oxidize it as a Python library for integration with vLLM using PyO3? |
Indeed, exposing aicirt as a library might be better from python standpoint. However, there is still internal concurrency within aicirt - that is running multiple sequence controllers in parallel. I think you don't want to expose that to Python, as it would have performance implications. Either way, that can be handled as processes (as done now) or processes. |
From #84:
Originally posted by @mmoskal in #84 (comment)
The text was updated successfully, but these errors were encountered: