This project is a web application that allows users to upload images, generate captions using a pre-trained BLIP model, and display the images along with their captions in a gallery. The application is built using Gradio for the web interface and utilizes the transformers
library for image captioning.
- Upload images from your device or capture using a webcam (if on a laptop).
- Generate captions for uploaded images using the BLIP model.
- Display images and their captions in a gallery.
- Save images and captions to a JSON file for persistence.
- Python 3.7+
gradio
Pillow
torch
transformers
user_agents
-
Clone the repository:
git clone https://github.com/bnarasimha/SnapInsight.git cd SnapInsight
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Run the application:
python app.py
-
Open your web browser and navigate to
http://0.0.0.0:7860
. -
Upload an image or capture one using your webcam (if on a laptop).
-
Click the "Submit" button to generate a caption and add the image to the gallery.
You can also run the application by executing run.sh file under scripts folder by providing just the droplet name, huggingface token and ssh fingerprint
cd scripts
chmod +x run.sh
./run.sh
app.py
: Main application file.requirements.txt
: List of required Python packages.saved_images/
: Directory where uploaded images are saved.image_data.json
: JSON file where image data (filenames and captions) are stored.
This project is licensed under the MIT License. See the LICENSE file for details.