Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wonderful Work! Adaptation for MacOS and Mobile Devices and Text Rendering Limitations #24

Open
viiika opened this issue Nov 22, 2024 · 4 comments

Comments

@viiika
Copy link

viiika commented Nov 22, 2024

Adaptation for MacOS and Mobile Devices

Given the model's relatively small parameter size and efficient performance, I was wondering if there are any plans to adapt it for MacOS devices with M-series chips or mobile platforms like the iPhone. This would be an exciting development, and I'm happy to wait for such updates!

Text Rendering Limitations

Additionally, I’ve noticed some limitations in the text rendering ability — the model seems to struggle with synthesizing more than a single word or with words containing multiple letters. Any insights into the factors contributing to this performance? Is it related to the training data, the constraints of the text encoder, or perhaps some other reasons?

Here’s a sample code snippet I used for testing:

import torch
from app.sana_pipeline import SanaPipeline
from torchvision.utils import save_image

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
generator = torch.Generator(device=device).manual_seed(42)

sana = SanaPipeline("configs/sana_config/1024ms/Sana_1600M_img1024.yaml")
sana.from_pretrained("hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth")
prompt = 'a cyberpunk cat with a neon sign that says "Sana, test whether it has the long text rendering ability."'

image = sana(
    prompt=prompt,
    height=1024,
    width=1024,
    guidance_scale=5.0,
    pag_guidance_scale=2.0,
    num_inference_steps=18,
    generator=generator,
)
save_image(image, 'sana.png', nrow=1, normalize=True, value_range=(-1, 1))

Results:
sana
sana2
sana3

@viiika viiika changed the title Wonderful Work! Wonderful Work! Adaptation for MacOS and Mobile Devices and Text Rendering Limitations Nov 22, 2024
@viiika
Copy link
Author

viiika commented Nov 22, 2024

And if I increase steps to 64, it still fails.
sana2 (1)
sana3 (1)

@viiika
Copy link
Author

viiika commented Nov 22, 2024

64 steps with prompt 'a cute cat with a neon sign that says "Meissonic"'. Both fail to render all the letters completely.
sana2 (2)
sana3 (2)

@lawrence-cj
Copy link
Collaborator

We have basic text rendering ability, it's not used for generate complex text rendering actually. IMO, the ability is related to the dataset. Better to finetune the model with the dataset having text rendering style you want for better performance.

@viiika
Copy link
Author

viiika commented Nov 22, 2024

We have basic text rendering ability, it's not used for generate complex text rendering actually. IMO, the ability is related to the dataset. Better to finetune the model with the dataset having text rendering style you want for better performance.

Thanks for your prompt reply. That's pretty useful information for future research.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants