You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given the model's relatively small parameter size and efficient performance, I was wondering if there are any plans to adapt it for MacOS devices with M-series chips or mobile platforms like the iPhone. This would be an exciting development, and I'm happy to wait for such updates!
Text Rendering Limitations
Additionally, I’ve noticed some limitations in the text rendering ability — the model seems to struggle with synthesizing more than a single word or with words containing multiple letters. Any insights into the factors contributing to this performance? Is it related to the training data, the constraints of the text encoder, or perhaps some other reasons?
Here’s a sample code snippet I used for testing:
import torch
from app.sana_pipeline import SanaPipeline
from torchvision.utils import save_image
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
generator = torch.Generator(device=device).manual_seed(42)
sana = SanaPipeline("configs/sana_config/1024ms/Sana_1600M_img1024.yaml")
sana.from_pretrained("hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth")
prompt = 'a cyberpunk cat with a neon sign that says "Sana, test whether it has the long text rendering ability."'
image = sana(
prompt=prompt,
height=1024,
width=1024,
guidance_scale=5.0,
pag_guidance_scale=2.0,
num_inference_steps=18,
generator=generator,
)
save_image(image, 'sana.png', nrow=1, normalize=True, value_range=(-1, 1))
Results:
The text was updated successfully, but these errors were encountered:
viiika
changed the title
Wonderful Work!
Wonderful Work! Adaptation for MacOS and Mobile Devices and Text Rendering Limitations
Nov 22, 2024
We have basic text rendering ability, it's not used for generate complex text rendering actually. IMO, the ability is related to the dataset. Better to finetune the model with the dataset having text rendering style you want for better performance.
We have basic text rendering ability, it's not used for generate complex text rendering actually. IMO, the ability is related to the dataset. Better to finetune the model with the dataset having text rendering style you want for better performance.
Thanks for your prompt reply. That's pretty useful information for future research.
Adaptation for MacOS and Mobile Devices
Given the model's relatively small parameter size and efficient performance, I was wondering if there are any plans to adapt it for MacOS devices with M-series chips or mobile platforms like the iPhone. This would be an exciting development, and I'm happy to wait for such updates!
Text Rendering Limitations
Additionally, I’ve noticed some limitations in the text rendering ability — the model seems to struggle with synthesizing more than a single word or with words containing multiple letters. Any insights into the factors contributing to this performance? Is it related to the training data, the constraints of the text encoder, or perhaps some other reasons?
Here’s a sample code snippet I used for testing:
Results:
The text was updated successfully, but these errors were encountered: