This guide provides detailed instructions on how to use the BLIP3 autocaptioning tools.
You can run the script in three different modes: analyzing a single image or a directory of images, interacting with the model via a chat interface, or using a FastAPI application for HTTP requests.
To analyze a single image, run the following command:
python src/analyze.py path/to/image.jpg "Describe the image"
To analyze all images in a directory, run the following command:
python src/analyze.py path/to/directory "Describe the image"
To save the AI's responses to text files, add the --save_response
flag:
python src/analyze.py path/to/image.jpg "Describe the image" --save_response
You can interact with the BLIP3 model via a CLI chat interface.
-
Navigate to the Project Directory:
cd path/to/your/project
-
Run the Chat Interface:
python src/chat.py
-
Interacting with the Model:
- Provide the path to an image when prompted.
- Enter your query to describe or ask questions about the image.
- Type
clear
to clear the current session and provide a new image.
Image path >>>>> path/to/image.jpg
Human: Describe the image
Assistant: The image shows a beautiful sunset over a mountain range with vibrant colors.
Human: What is the dominant color in the image?
Assistant: The dominant color in the image is orange, highlighting the sunset.
You can also use the BLIP3 autocaptioning tools via a FastAPI application. This allows you to interact with the tools through HTTP requests.
-
Navigate to the Project Directory:
cd path/to/your/project
-
Run the FastAPI Application:
python src/app.py
The FastAPI application will start and be available at http://127.0.0.1:8000
.
To analyze an image, you can use a tool like curl
or Postman to send a POST request to the /analyze
endpoint.
curl -X POST "http://127.0.0.1:8000/analyze" \
-F "file=@path/to/your/image.jpg" \
-F "query=Describe the image" \
-F "max_new_tokens=768" \
-F "save_response=false"
To save the AI's responses along with the uploaded image, set the save_response
field to true
:
curl -X POST "http://127.0.0.1:8000/analyze" \
-F "file=@path/to/your/image.jpg" \
-F "query=Describe the image" \
-F "max_new_tokens=768" \
-F "save_response=true"
path
: Path to the image file or directory containing images.query
: Query to ask about the image(s).--max_new_tokens
: Maximum number of new tokens to generate (default: 768).--num_beams
: Number of beams for beam search (default: 1).--save_response
: Save the response to a text file.
Command:
python src/analyze.py example.jpg "Describe the image"
Output:
==> example.jpg: The image captures a serene scene of a white crane, its wings spread wide in a display of majesty, standing on the shore of a tranquil lake
Command:
python src/analyze.py images/ "Describe the images" --save_response
Output:
==> image1.jpg: The image captures a serene scene of a white crane, its wings spread wide in a display of majesty, standing on the shore of a tranquil lake
Response saved to: images/image1.txt
==> image2.jpg: The image portrays a woman lying on a grassy field
Response saved to: images/image2.txt
If you encounter any issues, please check the following:
- Ensure that all dependencies are installed correctly.
- Verify that the paths to the images are correct.
- Check the console output for any error messages and follow the suggestions provided.
For further assistance, feel free to open an issue on the GitHub repository.
Happy autocaptioning!