Wonderful Work! Adaptation for MacOS and Mobile Devices and Text Rendering Limitations #24

viiika · 2024-11-22T07:14:35Z

Adaptation for MacOS and Mobile Devices

Given the model's relatively small parameter size and efficient performance, I was wondering if there are any plans to adapt it for MacOS devices with M-series chips or mobile platforms like the iPhone. This would be an exciting development, and I'm happy to wait for such updates!

Text Rendering Limitations

Additionally, I’ve noticed some limitations in the text rendering ability — the model seems to struggle with synthesizing more than a single word or with words containing multiple letters. Any insights into the factors contributing to this performance? Is it related to the training data, the constraints of the text encoder, or perhaps some other reasons?

Here’s a sample code snippet I used for testing:

import torch
from app.sana_pipeline import SanaPipeline
from torchvision.utils import save_image

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
generator = torch.Generator(device=device).manual_seed(42)

sana = SanaPipeline("configs/sana_config/1024ms/Sana_1600M_img1024.yaml")
sana.from_pretrained("hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth")
prompt = 'a cyberpunk cat with a neon sign that says "Sana, test whether it has the long text rendering ability."'

image = sana(
    prompt=prompt,
    height=1024,
    width=1024,
    guidance_scale=5.0,
    pag_guidance_scale=2.0,
    num_inference_steps=18,
    generator=generator,
)
save_image(image, 'sana.png', nrow=1, normalize=True, value_range=(-1, 1))

Results:

The text was updated successfully, but these errors were encountered:

viiika · 2024-11-22T07:27:05Z

And if I increase steps to 64, it still fails.

viiika · 2024-11-22T07:51:15Z

64 steps with prompt 'a cute cat with a neon sign that says "Meissonic"'. Both fail to render all the letters completely.

lawrence-cj · 2024-11-22T08:58:50Z

We have basic text rendering ability, it's not used for generate complex text rendering actually. IMO, the ability is related to the dataset. Better to finetune the model with the dataset having text rendering style you want for better performance.

viiika · 2024-11-22T09:04:52Z

We have basic text rendering ability, it's not used for generate complex text rendering actually. IMO, the ability is related to the dataset. Better to finetune the model with the dataset having text rendering style you want for better performance.

Thanks for your prompt reply. That's pretty useful information for future research.

viiika changed the title ~~Wonderful Work!~~ Wonderful Work! Adaptation for MacOS and Mobile Devices and Text Rendering Limitations Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wonderful Work! Adaptation for MacOS and Mobile Devices and Text Rendering Limitations #24

Wonderful Work! Adaptation for MacOS and Mobile Devices and Text Rendering Limitations #24

viiika commented Nov 22, 2024 •

edited

Loading

viiika commented Nov 22, 2024

viiika commented Nov 22, 2024

lawrence-cj commented Nov 22, 2024

viiika commented Nov 22, 2024

Wonderful Work! Adaptation for MacOS and Mobile Devices and Text Rendering Limitations #24

Wonderful Work! Adaptation for MacOS and Mobile Devices and Text Rendering Limitations #24

Comments

viiika commented Nov 22, 2024 • edited Loading

Adaptation for MacOS and Mobile Devices

Text Rendering Limitations

viiika commented Nov 22, 2024

viiika commented Nov 22, 2024

lawrence-cj commented Nov 22, 2024

viiika commented Nov 22, 2024

viiika commented Nov 22, 2024 •

edited

Loading