Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No editing Prompt doesn't help most image reconstruction? #25

Open
SunzeY opened this issue Sep 12, 2024 · 3 comments
Open

No editing Prompt doesn't help most image reconstruction? #25

SunzeY opened this issue Sep 12, 2024 · 3 comments

Comments

@SunzeY
Copy link

SunzeY commented Sep 12, 2024

I test with many image, but most of them have great shift compare to original image... Is their anything wrong like t, cfg and topk?

from inference_solver import FlexARInferenceSolver
inference_solver = FlexARInferenceSolver(
    model_path="Alpha-VLLM/Lumina-mGPT-7B-768-Omni",
    precision="bf16",
    target_size=768,
)
from PIL import Image
q1 = "No edit. <|image|>"
images = [Image.open("input.png")]
qas = [[q1, None]]

generated = inference_solver.generate(
    images=images,
    qas=qas,
    max_gen_len=8192,
    temperature=1.0,
    logits_processor=inference_solver.create_logits_processor(cfg=1.0, image_top_k=200),
)
a1 = generated[0]
new_image = generated[1][0]

Here is my input image and output image.
image
image

@zhaoshitian
Copy link
Member

In our experiments, CFG and Topk value will affect the resulting image significantly. We recommend that CFG value be set bigger than 3.0, and Topk value be set between 2000 and 4000.

@ChrisLiu6
Copy link
Contributor

ChrisLiu6 commented Sep 12, 2024

I test with many image, but most of them have great shift compare to original image... Is their anything wrong like t, cfg and topk?

from inference_solver import FlexARInferenceSolver
inference_solver = FlexARInferenceSolver(
    model_path="Alpha-VLLM/Lumina-mGPT-7B-768-Omni",
    precision="bf16",
    target_size=768,
)
from PIL import Image
q1 = "No edit. <|image|>"
images = [Image.open("input.png")]
qas = [[q1, None]]

generated = inference_solver.generate(
    images=images,
    qas=qas,
    max_gen_len=8192,
    temperature=1.0,
    logits_processor=inference_solver.create_logits_processor(cfg=1.0, image_top_k=200),
)
a1 = generated[0]
new_image = generated[1][0]

Here is my input image and output image. image image

Note that the "No edit." prompt is zero-shot as it was not specially used during training

@SunzeY
Copy link
Author

SunzeY commented Sep 12, 2024

I test with many image, but most of them have great shift compare to original image... Is their anything wrong like t, cfg and topk?

from inference_solver import FlexARInferenceSolver
inference_solver = FlexARInferenceSolver(
    model_path="Alpha-VLLM/Lumina-mGPT-7B-768-Omni",
    precision="bf16",
    target_size=768,
)
from PIL import Image
q1 = "No edit. <|image|>"
images = [Image.open("input.png")]
qas = [[q1, None]]

generated = inference_solver.generate(
    images=images,
    qas=qas,
    max_gen_len=8192,
    temperature=1.0,
    logits_processor=inference_solver.create_logits_processor(cfg=1.0, image_top_k=200),
)
a1 = generated[0]
new_image = generated[1][0]

Here is my input image and output image. image image

Note that the "No edit." prompt is zero-shot as it was not specially used during training

Does it mean that I have loaded the incorrect model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants