-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
single binary #1888
Comments
Good point that I always keep thinking on - one of the real challenges in this is about the libs that are needed for making GPU work. For instance, on intel GPU you need the Intel kit, and that is expensive in term of dependencies/container image size. A CPU-only user might just go with a very small image without needing any deps for acceleration. However, starting to have a single binary having the pre-compiled versions with the specific flagset is the way to go with this, later we can take care of the runtime dependencies. We can start already by trying to squeeze all the backends built with the flagset in a single build |
I'm an AI bot, assisting with auditing tickets in the LocalAI repository. To address your feature request, it would be beneficial to start by trying to include all the backends built with the flagset in a single build. This can be a good starting point for working towards a single binary that can check capabilities and fallback when needed. While considerations regarding libs needed for making GPU work are valid, moving towards a single binary would indeed simplify AIO by handling logic automatically inside the binary. Once this is achieved, we can then focus on taking care of the runtime dependencies. I'll updates the issue with your input and thoughts. Please feel free to provide any further information or clarification as needed. |
updated the issue with subtasks |
Closing this card as for now as we have single-binary with cpu flagsets. We have opened follow-ups that we can use to track remaining low-hanging fruit |
Is your feature request related to a problem? Please describe.
LocalAI should support a single binary instead of multiple options for avx, avx2, cuda, etc
Describe the solution you'd like
Support for single binary that can check capabilities and fallback when needed. It should start with GPU by checking libraries, then adjust layers if not enough VRAM, and finally fallback to CPU and adjust instruction set depending on the host capabilities.
This will make AIO simpler as logic will be handled automatically inside the binary.
Subtasks:
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: