This project utilizes the Stable Diffusion model to extend images based on user-provided text prompts and specified directions (e.g., left, right, top, bottom). The pipeline processes the input image and generates an extended version according to the directions specified.
- Extend images based on a text prompt.
- Specify one or more directions for the image extension: left, right, top, bottom.
- Simple and interactive user interface via Gradio.
- Open the Google Colab notebook.
- Run the setup code and run app code cells in the notebook.
- After execution, click on the Gradio link displayed in the notebook output.
- The Gradio link opens a live website where you can:
- Upload an Image: Provide an image you want to extend.
- Enter a Prompt: Describe the desired extension (e.g., "Lake and grass fields").
- Select Directions: Choose one or more directions (left, right, top, bottom) for the extension.
- Generate: Click the button to process and view the extended image.
-
Input:
- An image to be extended.
- A text prompt describing how the extension should look.
- The direction(s) in which the image should be extended.
- No. of inference images to be generated.
-
Processing:
- The given image is masked w.r.t. the given direction.
- The masked image and prompt are passed to the Stable Diffusion model.
- The model generates an extended version of the image based on the given prompt and direction(s).
-
Output:
- The extended image is displayed on the live website.
-
Input:
-
Output:
-
Input:
-
Output:
- Model: Stable Diffusion, fine-tuned for image extension tasks.
- Framework: Gradio for building the interactive interface.
- Deployment: The system runs entirely within Google Colab for easy setup and execution.