Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Added a lock object into `SafeLlamaModelHandle` which all calls to `llama_decode` (in the `SafeLLamaContextHandle`) lock first. This prevents two contexts from running inference on the same model at the same time, which seems to be unsafe in llama.cpp. * Modified the lock to be global over _all_ inferences. This seems to be necessary (at least with the CUDA backend).
- Loading branch information