Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Integrate Features from Ollama #18

Open
evertjr opened this issue Jul 16, 2024 · 8 comments
Open

Feature Request: Integrate Features from Ollama #18

evertjr opened this issue Jul 16, 2024 · 8 comments

Comments

@evertjr
Copy link

evertjr commented Jul 16, 2024

Description:
One of the reasons Ollama is so widely adopted as a tool to run local models is its ease of use and seamless integration with other tools. Users can simply install an app that starts a server on the machine along with a terminal CLI to download and manage models. It would be beneficial to integrate several features from Ollama into FastMLX to enhance user experience and functionality.

Suggested Features:

  1. Simple App to Start with the System:

    • Develop a lightweight desktop application that starts with the system and provides easy access to FastMLX's functionalities. This application should have a system tray icon for quick settings and access.
  2. CLI Client to Manage Models:

    • Implement a command-line interface (CLI) for downloading and managing models. This CLI should offer commands similar to those in Ollama, such as:
      • fastmlx run gemma2 - To test the selected model in the terminal
      • fastmlx pull gemma2 - To download a specified model.
      • fastmlx rm gemma2 - To remove a specified model.
      • fastmlx list - To list all downloaded models.
@stewartugelow
Copy link

What sorts of things are you thinking about in terms of Ollama’s integration with other tools?

@evertjr
Copy link
Author

evertjr commented Jul 16, 2024

My main use case for Ollama is for coding assistance. It works with the continue.dev extension on vscode for both chat and tab autocomplete and it works great. There's also native apps like mindmac to chat with local models, there's even a Raycast extension which can quickly receive selected files and text as context.

@stewartugelow
Copy link

Here's what I could find on your ideas:

  1. Simple app to start the system

Looks like this can be done with a combination of launchctl and a bundled applescript application, according to ChatGPT

https://chatgpt.com/share/06771026-c28d-4019-9fab-c4dbd5af4a1d

  1. The CLI Client to Manage Models

The code for this essentially exists in the API endpoints, so it's just a matter of whether @Blaizzy wants to mimic the ollama syntax in the CLI, too.

  1. Integration with other tools

I looked at the examples you gave, and, as far as I can tell, those tools are simply including presets of the API endpoints and chat model as a convenience. Maybe there could be a section added to the docs on "integrating FastMLX into your app" that users could send to application developers. Although @Blaizzy might want to pick a port other than 8000 as a default that would be unlikely to conflict with any other FastAPI installs. (Ollama uses port 11434 as a default.)

Hope this helps!

@Blaizzy
Copy link
Collaborator

Blaizzy commented Jul 16, 2024

Hey @evertjr and @stewartugelow

Thank you very much for the discussion!

I think I got a high-level idea of use cases.

Regarding number 1, I will make it a part of LisaPro my local coding assistant for Mac offer. We are launching end of this month.

Regarding number 2, as @stewartugelow suggested, the APIs already exist. It’s a matter adding those CLI commands.

Regarding number 3, @stewartugelow could you to add the examples here #19 in the docs/examples directory?

@stewartugelow
Copy link

stewartugelow commented Jul 18, 2024

Regarding number 1 — when do you sleep??? :)

Regarding number 3 — I’m happy to, but I’m not sure what you mean?

@Blaizzy
Copy link
Collaborator

Blaizzy commented Aug 2, 2024

  1. Never :)

  2. I meant add the code/cookbook example on how to integrate FastMLX with different apps.

@Blaizzy
Copy link
Collaborator

Blaizzy commented Aug 2, 2024

@evertjr I will add the features you requested (CLI Client to Manage Models) here after #21. It will come alongside a FastMLX Python client so you can start FastMLX programmatically.

The lightweight app is on the backlog at the moment.

@viljark
Copy link

viljark commented Aug 29, 2024

First of all thank you for the awesome work! I would be very interested in the following Ollama features to be available with fastmlx:

  1. keep_alive: controls how long the model will stay loaded into memory following the request (default: 5m). I would like the model to be unloaded if it is not used in a while.
  2. OLLAMA_MAX_LOADED_MODELS: Set to zero for fully dynamic based on VRAM capacity, or a fixed number greater than 1 to limit the total number of loaded models regardless of VRAM capacity. This way model switching would be seamless by just unloading other models if VRAM limits are reaching (more info).
  3. v1/models to return OpenAI api compatible response with all installed local models listed, not just the loaded ones

These features would be really nice QOL improvements when you like to switch models often, run split chats with different models and still keep the system performance optimal. The biggest benefit from the mlx would be model loading speed, which on my M3 mac is more than 2x faster than in Ollama and reduced memory usage..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants