-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: WIP: Adjust GPU Layers #3737
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Siddharth More <[email protected]>
✅ Deploy Preview for localai ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
…6b61dc98b87a` (mudler#3718) ⬆️ Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>
…dfbc9d51570c4e` (mudler#3719) ⬆️ Update ggerganov/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>
…ler#3721) Signed-off-by: Ettore Di Giacinto <[email protected]>
Updated some formatting in the doc. Signed-off-by: JJ Asghar <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
…f07d9d7a6077` (mudler#3725) ⬆️ Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>
…ab770389bb442b` (mudler#3724) ⬆️ Update ggerganov/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>
feat(multimodal): allow to template image placeholders Signed-off-by: Ettore Di Giacinto <[email protected]>
Signed-off-by: Ettore Di Giacinto <[email protected]>
) * feat(vllm): add support for image-to-text Related to mudler#3670 Signed-off-by: Ettore Di Giacinto <[email protected]> * feat(vllm): add support for video-to-text Closes: mudler#2318 Signed-off-by: Ettore Di Giacinto <[email protected]> * feat(vllm): support CPU installations Signed-off-by: Ettore Di Giacinto <[email protected]> * feat(vllm): add bnb Signed-off-by: Ettore Di Giacinto <[email protected]> * chore: add docs reference Signed-off-by: Ettore Di Giacinto <[email protected]> * Apply suggestions from code review Signed-off-by: Ettore Di Giacinto <[email protected]> --------- Signed-off-by: Ettore Di Giacinto <[email protected]> Signed-off-by: Ettore Di Giacinto <[email protected]>
…b00e0223b6fa` (mudler#3731) ⬆️ Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>
…5c9e2b2529ff2c` (mudler#3730) ⬆️ Update ggerganov/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>
We default to a soft kill, however, we might want to force killing backends after a while to avoid hanging requests (which may hallucinate indefinetly) Signed-off-by: Ettore Di Giacinto <[email protected]>
If the LLM does not implement any logic for PredictStream, we close the channel immediately to not leave the process hanging. Signed-off-by: Ettore Di Giacinto <[email protected]>
…c8930d19f45773` (mudler#3735) ⬆️ Update ggerganov/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>
…1bd8811a9b44` (mudler#3736) ⬆️ Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>
Signed-off-by: Siddharth More <[email protected]>
Signed-off-by: Siddharth More <[email protected]>
Signed-off-by: Siddharth More <[email protected]>
Signed-off-by: Siddharth More <[email protected]>
Signed-off-by: Siddharth More <[email protected]>
@mudler can u kindly check the PR approach and give some high level feedback when possible? Next step that i will add is some sort of GPU_Layer estimator based on:
|
Signed-off-by: Siddharth More <[email protected]>
Signed-off-by: Siddharth More <[email protected]>
@@ -70,6 +70,7 @@ type RunCMD struct { | |||
Federated bool `env:"LOCALAI_FEDERATED,FEDERATED" help:"Enable federated instance" group:"federated"` | |||
DisableGalleryEndpoint bool `env:"LOCALAI_DISABLE_GALLERY_ENDPOINT,DISABLE_GALLERY_ENDPOINT" help:"Disable the gallery endpoints" group:"api"` | |||
LoadToMemory []string `env:"LOCALAI_LOAD_TO_MEMORY,LOAD_TO_MEMORY" help:"A list of models to load into memory at startup" group:"models"` | |||
AdjustGPULayers bool `env:"LOCALAI_ADJUST_GPU_LAYERS,ADJUST_GPU_LAYERS" help:"Enable OffLoading of model layers to GPU" group:"models"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: I would call this something like AutomaticallyAdjustGPULayers
:
AdjustGPULayers bool `env:"LOCALAI_ADJUST_GPU_LAYERS,ADJUST_GPU_LAYERS" help:"Enable OffLoading of model layers to GPU" group:"models"` | |
AutomaticallyAdjustGPULayers bool `env:"LOCALAI_AUTO_ADJUST_GPU_LAYERS,ADJUST_GPU_LAYERS" help:"Enable Automatic OffLoading of model layers to GPU" group:"models"` |
|
||
// GetNvidiaGpuInfo uses pkg nvml is a go binding around C API provided by libnvidia-ml.so | ||
// to fetch GPU stats | ||
func GetNvidiaGpuInfo() ([]GPUInfo, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, but it's good practice to keep acronyms uppercase, e.g. UnmarshalYAML
, GetXYZ
:
func GetNvidiaGpuInfo() ([]GPUInfo, error) { | |
func GetNvidiaGPUInfo() ([]GPUInfo, error) { |
} | ||
} | ||
|
||
func TestGetModelGGufData_URL_WithMockedEstimateModelMemoryUsage(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a minor nit, but all other tests are using ginkgo - do you feel to use ginkgo as well? not a blocker in any case, at some point will refactor things out to be more consistent if needed
) | ||
|
||
// Interface for parsing different model formats | ||
type LocalAIGGUFParser interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit style (non-blocking): it would make the code more reusable if the interface here would require only ParseGGUFFile
and have different parsers implementing their ParseGGUFFile logic, for instance a Ollama parser, a Huggingface Parser, etc.
The caller then would need to instantiate the needed parser down the line, for instance:
type GGUFParser interface {
Parse(path string) (*ggufparser.GGUFFile, error)
}
// GetModelGGufData returns the resources estimation needed to load the model.
func GetModelGGufData(modelPath string, estimator ModelMemoryEstimator, ollamaModel bool) (*ModelEstimate, error) {
ctx := context.Background()
fmt.Println("ModelPath: ", modelPath)
var ggufParser GGUFParser
// Check if the input is a valid URL
switch {
case isURL(modelPath):
ggufParser = &RemoteFileParser{ctx,modelPath}
case ollamaModel:
ggufParser = &OllaamParser{ctx,modelPath}
/// .. other parsers here
}
return estimator.Estimate(ggufRemoteData)
}
Considering that we pass an estimator down the line tells me that this actually should be part of ModelMemoryEstimator
as well:
func (g GGUFEstimator) GetModelGGufData(modelPath string, ollamaModel bool) (*ModelEstimate, error) {
@siddimore thanks for taking a stab at this, direction looks good here - just few minor nits here and there but definitely not blockers |
thanks much @mudler you are welcome!! i will improve the code and add some more testing. Appreciate the feedback and will fix the comments |
Description
TODO
This PR fixes #3541
Notes for Reviewers
Signed commits