The project implements AI DIAL API for language models and embeddings from Vertex AI.
The following models support POST SERVER_URL/openai/deployments/DEPLOYMENT_NAME/chat/completions
endpoint along with optional support of /tokenize
and /truncate_prompt
endpoints:
Model | Deployment name | Modality | /tokenize |
/truncate_prompt |
tools/functions support |
---|---|---|---|---|---|
Gemini 2.0 Flash | gemini-2.0-flash-exp | (text/pdf/image/audio/video)-to-text | ✅ | ✅ | ✅ |
Gemini 2.0 Experimental | gemini-exp-1206 | (text/pdf/image/audio/video)-to-text | ✅ | ✅ | ✅ |
Gemini 2.0 Flash Thinking | gemini-2.0-flash-thinking-exp-1219 | text-to-text | ✅ | ✅ | ❌ |
Gemini 1.5 Pro | gemini-1.5-pro-[preview-0409|001|002] | (text/pdf/image/audio/video)-to-text | ✅ | ✅ | ✅ |
Gemini 1.5 Flash | gemini-1.5-flash-[001|002] | (text/pdf/image/audio/video)-to-text | ✅ | ✅ | ✅ |
Gemini 1.0 Pro Vision | gemini-pro-vision | (text/pdf/image/video)-to-text | ✅ | ✅ | ❌ |
Gemini 1.0 Pro | gemini-pro | text-to-text | ✅ | ✅ | ✅ |
Imagen 2 | imagegeneration@005 | text-to-image | ✅ | ✅ | ❌ |
PaLM 2 Chat Bison | chat-bison@001 | text-to-text | ✅ | ✅ | ❌ |
PaLM 2 Chat Bison | chat-bison@002 | text-to-text | ✅ | ✅ | ❌ |
PaLM 2 Chat Bison | chat-bison-32k@002 | text-to-text | ✅ | ✅ | ❌ |
Codey for Code Chat | codechat-bison@001 | text-to-text | ✅ | ✅ | ❌ |
Codey for Code Chat | codechat-bison@002 | text-to-text | ✅ | ✅ | ❌ |
Codey for Code Chat | codechat-bison-32k@002 | text-to-text | ✅ | ✅ | ❌ |
The models that support /truncate_prompt
do also support max_prompt_tokens
request parameter.
The following models support SERVER_URL/openai/deployments/DEPLOYMENT_NAME/embeddings
endpoint:
Model | Deployment name | Language support | Modality |
---|---|---|---|
Gecko Embeddings for Text V1 | textembedding-gecko@001 | English | text-to-embedding |
Gecko Embeddings for Text V3 | textembedding-gecko@003 | English | text-to-embedding |
Embeddings for Text | text-embedding-004 | English | text-to-embedding |
Gecko Embeddings for Text Multilingual | textembedding-gecko-multilingual@001 | Multilingual | text-to-embedding |
Embeddings for Text Multilingual | text-multilingual-embedding-002 | Multilingual | text-to-embedding |
Multimodal embeddings | multimodalembedding@001 | English | (text/image)-to-embedding |
This project uses Python>=3.11 and Poetry>=1.6.1 as a dependency manager.
Check out Poetry's documentation on how to install it on your system before proceeding.
To install requirements:
poetry install
This will install all requirements for running the package, linting, formatting and tests.
The recommended IDE is VSCode. Open the project in VSCode and install the recommended extensions.
The VSCode is configured to use PEP-8 compatible formatter Black.
Alternatively you can use PyCharm.
Set-up the Black formatter for PyCharm manually or install PyCharm>=2023.2 with built-in Black support.
Run the development server:
make serve
Open localhost:5001/docs
to make sure the server is up and running.
Copy .env.example
to .env
and customize it for your environment:
Variable | Default | Description |
---|---|---|
GOOGLE_APPLICATION_CREDENTIALS | Filepath to JSON with credentials | |
DEFAULT_REGION | Default region for Vertex AI (e.g. "us-central1") | |
GCP_PROJECT_ID | GCP project ID | |
LOG_LEVEL | INFO | Log level. Use DEBUG for dev purposes and INFO in prod |
AIDIAL_LOG_LEVEL | WARNING | AI DIAL SDK log level |
WEB_CONCURRENCY | 1 | Number of workers for the server |
DIAL_URL | URL of the core DIAL server. Optional. Used to access images stored in the DIAL File storage |
Run the server in Docker:
make docker_serve
Run the linting before committing:
make lint
To auto-fix formatting issues run:
make format
Run unit tests locally:
make test
Run unit tests in Docker:
make docker_test
Run integration tests locally:
make integration_tests
To remove the virtual environment and build artifacts:
make clean