Skip to content
/ vlm-ui Public

Web Interface for Vision Language Models Including InternVLM2

Notifications You must be signed in to change notification settings

sammcj/vlm-ui

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Jul 29, 2024
fbd5bd0 · Jul 29, 2024

History

1 Commit
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024
Jul 29, 2024

Repository files navigation

VLM UI

VLM UI is a web-based user interface for interacting with various Vision Language Models (VLMs).

It provides a convenient way to upload images, ask questions, and receive responses from the model.

VLM UI Screenshot

Features

  • Web-based interface using Gradio
  • Support for multiple VLM models
  • Image upload and processing
  • Real-time streaming responses
  • Dockerised deployment

Prerequisites

  • Docker
  • NVIDIA GPU with CUDA support (for running models)

Quick Start

  1. Clone the repository:

    git clone --depth=1 https://github.com/sammcj/vlm-ui.git
    cd vlm-ui
  2. Build and run the Docker container:

    docker build -t vlm-ui .
    docker run -d --gpus all -p 7860:7860 -e MODEL_NAME=OpenGVLab/InternVL2-8B vlm-ui
  3. Open your browser and navigate to http://localhost:7860 to access the VLM UI.

Configuration

You can customize the behaviour of VLM UI by setting the following environment variables:

  • SYSTEM_MESSAGE: The system message to use for the conversation (default: "Carefully follow the users request.")
  • TEMPERATURE: Controls randomness in the model's output (default: 0.3)
  • TOP_P: Controls diversity of the model's output (default: 0.7)
  • MAX_NEW_TOKENS: Maximum number of tokens to generate (default: 2048)
  • MAX_INPUT_TILES: Maximum number of image tiles to process (default: 12)
  • REPETITION_PENALTY: Penalizes repetition in the model's output (default: 1.0)
  • MODEL_NAME: The name of the model to use (default: OpenGVLab/InternVL2-8B)
  • LOAD_IN_8BIT: Whether to load the model in 8-bit precision (default: 1)

Example:

docker run -d --gpus all -p 7860:7860 \
  -e MODEL_NAME=OpenGVLab/InternVL2-8B \
  -e TEMPERATURE=0.3 \
  -e MAX_NEW_TOKENS=2048 \
  vlm-ui

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

  • Copyright Sam McLeod
  • This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

This app builds on the work of the following projects:

About

Web Interface for Vision Language Models Including InternVLM2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published