single binary #1888

sozercan · 2024-03-24T18:52:51Z

Is your feature request related to a problem? Please describe.

LocalAI should support a single binary instead of multiple options for avx, avx2, cuda, etc

Describe the solution you'd like

Support for single binary that can check capabilities and fallback when needed. It should start with GPU by checking libraries, then adjust layers if not enough VRAM, and finally fallback to CPU and adjust instruction set depending on the host capabilities.

This will make AIO simpler as logic will be handled automatically inside the binary.

Subtasks:

embed avx, avx2 and fallback into localai
embed cuda into localai
auto select cpu runtimes (feat: auto select llama-cpp cpu variant #2305)
auto select cuda runtime (feat: auto select llama-cpp cuda runtime #2306)
better gpu detection by checking cuda libraries in addition to devices #3637
feat: automatically adjust default gpu_layers by available GPU memory #3541
compress before embed and decompress when extracting to save space #3638

Describe alternatives you've considered

Additional context

mudler · 2024-03-25T17:20:18Z

Good point that I always keep thinking on - one of the real challenges in this is about the libs that are needed for making GPU work.

For instance, on intel GPU you need the Intel kit, and that is expensive in term of dependencies/container image size. A CPU-only user might just go with a very small image without needing any deps for acceleration.

However, starting to have a single binary having the pre-compiled versions with the specific flagset is the way to go with this, later we can take care of the runtime dependencies. We can start already by trying to squeeze all the backends built with the flagset in a single build

localai-bot · 2024-04-15T17:49:33Z

I'm an AI bot, assisting with auditing tickets in the LocalAI repository.

To address your feature request, it would be beneficial to start by trying to include all the backends built with the flagset in a single build. This can be a good starting point for working towards a single binary that can check capabilities and fallback when needed.

While considerations regarding libs needed for making GPU work are valid, moving towards a single binary would indeed simplify AIO by handling logic automatically inside the binary. Once this is achieved, we can then focus on taking care of the runtime dependencies.

I'll updates the issue with your input and thoughts. Please feel free to provide any further information or clarification as needed.

sozercan · 2024-05-24T21:23:48Z

updated the issue with subtasks

mudler · 2024-09-23T17:32:54Z

Closing this card as for now as we have single-binary with cpu flagsets. We have opened follow-ups that we can use to track remaining low-hanging fruit

sozercan added the enhancement New feature or request label Mar 24, 2024

sozercan mentioned this issue Mar 25, 2024

[REQ] auto detect runtime for an inference container sozercan/aikit#181

Closed

1 task

mudler added up for grabs Tickets that no-one is currently working on roadmap ux labels Mar 25, 2024

mudler mentioned this issue Mar 28, 2024

can we build non avx cpu aio images? #1916

Closed

sozercan self-assigned this Apr 22, 2024

sozercan removed the up for grabs Tickets that no-one is currently working on label Apr 22, 2024

This was referenced May 3, 2024

feat(llama.cpp): do not specify backends to autoload and add llama.cpp variants #2232

Merged

feat(startup): show CPU/GPU information with --debug #2241

Merged

feat(single-build): generate single binaries for releases #2246

Merged

sozercan mentioned this issue May 13, 2024

feat: auto select llama-cpp cpu variant #2305

Merged

1 task

sozercan mentioned this issue May 24, 2024

llama.cpp cuda detection does not work inside a container #2401

Closed

mudler closed this as completed Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

single binary #1888

single binary #1888

sozercan commented Mar 24, 2024 •

edited by mudler

Loading

mudler commented Mar 25, 2024 •

edited

Loading

localai-bot commented Apr 15, 2024

sozercan commented May 24, 2024

mudler commented Sep 23, 2024

single binary #1888

single binary #1888

Comments

sozercan commented Mar 24, 2024 • edited by mudler Loading

mudler commented Mar 25, 2024 • edited Loading

localai-bot commented Apr 15, 2024

sozercan commented May 24, 2024

mudler commented Sep 23, 2024

sozercan commented Mar 24, 2024 •

edited by mudler

Loading

mudler commented Mar 25, 2024 •

edited

Loading