A lightweight vanilla JavaScript implementation of the Gemini 2.0 Flash Multimodal Live API client. This project provides real-time interaction with Gemini's API through text, audio, video, and screen sharing capabilities.
This is a simplified version of Google's original React implementation, created in response to this issue.
- Real-time chat with Gemini 2.0 Flash Multimodal Live API
- Real-time audio responses from the model
- Real-time audio input from the user, allowing interruptions
- Real-time video streaming from the user's webcam
- Real-time screen sharing from the user's screen
- Function calling
- Transcription of the model's audio (if Deepgram API key provided)
- Built with vanilla JavaScript (no dependencies)
- Mobile-friendly
- Modern web browser with WebRTC, WebSocket, and Web Audio API support
- Google AI Studio API key
python -m http.server
ornpx http-server
or Live Server extension for VS Code (to host a server for index.html)
-
Get your API key from Google AI Studio
-
Clone the repository
git clone https://github.com/ViaAnthroposBenevolentia/gemini-2-live-api-demo.git
-
Start the development server (adjust port if needed):
cd gemini-2-live-api-demo python -m http.server 8000 # or npx http-server 8000 or Open with Live Server extension for VS Code
-
Access the application at
http://localhost:8000
-
Open the settings at the top right, paste your API key, and click "Save"
-
Get free API key from Deepgram and paste in the settings to get real-time transcript (Optional).
Contributions are welcome! Please feel free to submit issues and pull requests.
This project is licensed under the MIT License.