VisionaryAI is a versatile web application that leverages advanced AI models, including Gemini Pro Vision, DALL-E 3, and Stable Diffusion XL, to provide three main features: Chatbot Interaction, Image Captioning, and Text-to-Image Generation.
- ChatBot: Engage in real-time conversations with the AI, powered by the Gemini Pro model.
- Image Captioning: Generate descriptive captions for your images using the Gemini Pro Vision model.
- Text to Image: Generate images using either DALL-E 3 or Stable Diffusion XL.
-
Clone the repository:
git clone https://github.com/Abhrankan-Chakrabarti/GeminiFusion.git cd GeminiFusion
-
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
- Create a
.env
file in the root directory. - Add your Google API key:
api_key=YOUR_GOOGLE_API_KEY
- Create a
-
Run the application:
streamlit run app.py
-
Features:
- ChatBot: Navigate to the ChatBot section to start a conversation with the AI.
- Image Captioning: Upload an image and enter a prompt to generate a caption.
- Text to Image: Enter a text prompt to generate images using either DALL-E 3 or Stable Diffusion XL.
- Python
- Streamlit
- Google Gemini Pro
- Google Gemini Pro Vision
- DALL-E 3
- Stable Diffusion XL
We welcome contributions! Please see our contribution guidelines for more information.
This project is licensed under the MIT License. See the LICENSE file for details.