Sa2VA API Service

This project provides a Flask-based REST API service that utilizes ByteDance's Sa2VA models for dense grounded understanding of images. Sa2VA (Segment Anything 2 Visual Assistant) is a powerful multimodal language model that combines SAM2 with LLaVA capabilities, enabling advanced image analysis, question answering, and dense object segmentation.

Features

Model Capabilities:
- Question answering about images
- Optical Character Recognition
- Hand writing recognition
- Visual prompt understanding
- Dense object segmentation
- State-of-the-art performance on image grounding and segmentation benchmarks
API Features:
- RESTful API endpoints
- Base64 image input support
- Custom prompt configuration
- GPU-accelerated inference
- Service health monitoring
- Docker containerization

Prerequisites

Docker
NVIDIA GPU with CUDA support
NVIDIA Container Toolkit installed

Installation

Clone the repository:

git clone https://github.com/lojik-ng/sa2va-docker-API-Service.git
cd sa2va-docker-API-Service

Build the Docker image:

docker build -t sa2va-service .

Run the container:

docker run --gpus all -p 3303:3303 sa2va-service

Environment Variables

You can customize the service using the following environment variables:

MODEL_PATH: Path to the model (default: "ByteDance/Sa2VA-1B")
TRANSFORMERS_CACHE: Cache directory for model files (default: "/app/model_cache")

Example with custom model path:

docker run --gpus all -p 3303:3303 -e MODEL_PATH=ByteDance/Sa2VA-1B --name sa2va sa2va

Models:

ByteDance/Sa2VA-1B
ByteDance/Sa2VA-4B
ByteDance/Sa2VA-8B
ByteDance/Sa2VA-26B

API Endpoints

1. Root Endpoint

URL: GET /
Description: Welcome endpoint providing available API endpoints
Response:

{
    "message": "Welcome to the Sa2VA API Server",
    "endpoints": {
        "/health": "GET - Health check and server status",
        "/process": "POST - Process image with text prompt"
    }
}

2. Health Check

URL: GET /health
Description: Service health status and configuration
Response:

{
    "status": "running",
    "uptime": "0:10:30",
    "started_at": "2025-01-22T23:41:56",
    "model_path": "ByteDance/Sa2VA-1B",
    "endpoints": {
        "/health": "GET - Health check and server status",
        "/process": "POST - Process image with text prompt"
    }
}

3. Image Processing

URL: POST /process
Description: Process an image with the Sa2VA model
Headers:
- Content-Type: application/json
Request Body:

{
    "base64Image": "base64_encoded_image_string",
    "prompt": "your_analysis_prompt"
}

Response:

{
    "result": "Model's response to your prompt"
}

Testing

Here's an example of how to test the API using curl:

#!/bin/bash

# Convert image to base64
BASE64_IMAGE=$(base64 -w 0 your_image.jpg)

# Create JSON payload file
cat > payload.json << EOF
{
  "base64Image": "${BASE64_IMAGE}",
  "prompt": "Analyze this image and describe what you see."
}
EOF

# Send request using the JSON file
curl -X POST \
  -H "Content-Type: application/json" \
  -d @payload.json \
  http://localhost:3303/process

# Clean up
rm payload.json

Model Details

Sa2VA is built on InternVL2.5 and offers:

Comparable performance to state-of-the-art MLLMs like Qwen2-VL and InternVL2.5
Enhanced capabilities in visual prompt understanding
Superior performance in dense object segmentation
Support for both image and video analysis

For more information about the model, visit Sa2VA github.

License

This project uses the Sa2VA model which is subject to ByteDance's license terms. Please refer to the model's license information for usage terms and conditions.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
sa2va.py		sa2va.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sa2VA API Service

Features

Prerequisites

Installation

Environment Variables

Models:

API Endpoints

1. Root Endpoint

2. Health Check

3. Image Processing

Testing

Model Details

License

About

Releases

Packages

Languages

lojik-ng/sa2va-docker-API-Service

Folders and files

Latest commit

History

Repository files navigation

Sa2VA API Service

Features

Prerequisites

Installation

Environment Variables

Models:

API Endpoints

1. Root Endpoint

2. Health Check

3. Image Processing

Testing

Model Details

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages