Replies: 6 comments 3 replies
-
The models are designed for 1024x1024 generation. Generate with at-least 768x768 if possible. Use https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0_0.9vae.safetensors, avoiding the other models in the repo. Remove Try Not sure if |
Beta Was this translation helpful? Give feedback.
-
OK, making some progress. Per the readme at Huggingface, the diffusers, invisible watermark, transformers, and accelerate safetensors need to be updated. Using similar method above, one uses the activate.bat in ...venv\Scripts to pip install as follows. (This can be saved as a .bat or just run step by step in command prompt:
Next, I used the model with baked VAE from @ClashSAN listed here: Finally, using the --upcast-sampling in command line args, and Scaled Dot Product Attention set in the UI Optimization options, performance is decent enough. (Still getting high frequency, lower power problem) Note that RDNA 1 (RX 5000's) and/or RDNA 2 (RX 6000's) may need different settings:
Example Prompt: Negative: DPM ++ 2M Karras, 32 Steps, 768 X 1024. Time taken: 51.7 seconds. Command prompt reports ~ 1.5 seconds per iteration |
Beta Was this translation helpful? Give feedback.
-
Did some testing removing --no-half: results in a message being echoed: Creating model from config: C:\Users\MYUSERNAME\STABLEDIFFUSION\WEBUI\stable-diffusion-webui-directml\repositories\generative-models\configs\inference\sd_xl_base.yaml A tensor with all NaNs was produced in VAE. |
Beta Was this translation helpful? Give feedback.
-
Trying with ONLY --medvram results in the same: A tensor with all NaNs was produced in VAE. with the original arguments: The message is not produced. However, generation time is a tiny bit slower: about 1.18 seconds per iteration. (PS - I noticed that the units of performance echoed change between s/it and it/s depending on the speed. It would be like quote miles per gallon for vehicle fuel efficiency, but instead of saying "0.5 mpg" it would say " 2 gpm". ) |
Beta Was this translation helpful? Give feedback.
-
New test, same command line:
Regarding the quad optimization & token merging:
Using same prompt/settings/sampler from above "...Man with fiery sword..." Token Merge Ratio = 0.5 and Cross Attention Optimization set to sub-quadratic (both in Settings -> Optimizations within the UI). Results seem similar to using SDP Attention - about 1.2 seconds/iteration. Generating two 768 X 1024 images, 32 steps, DPM++ 2M Karras gives ~ 1.3 seconds/iteration or a total time of 1 min 44 seconds. Pretty happy overall with performance and results so far! Can't wait for the community to start making LORA's, models, Control Net, etc.! |
Beta Was this translation helpful? Give feedback.
-
A few updates after having some time to play with DreamShaper XL and test a bit more: The Refiner Model is tricky, and doesn't seem to work quite right. I use SDP Attention in menu, Token merge of 0.5. I've only had success using refiner with:
You can also add precision full:
However, I notice that --precision full only seems to increase the GPU temp. Junction temps on 7900XTX easily pass 95° and I never see temps like that in any other workload. Will need more testing to see if --precision full has any benefit. "Success" using refiner is is quite loosely defined here. I feel like it actually makes the image worse. Aside from that, everything seems to be running quite well! |
Beta Was this translation helpful? Give feedback.
-
Is anybody here running SD XL with DirectML deployment of Automatic1111? I downloaded the base SD XL Model, the Refiner Model, and the SD XL Offset Example LORA from Huggingface and put in appropriate folder.
I ran a Git Pull of the WebUI folder and also upgraded the python requirements.txt (see below for script).
Testing a few basic prompts such as:
"full body, anatomical photorealistic digital painting portrait of high elf wizard, fabric with intricate pattern, casting a spell of lightning, in a fantasy atmosphere, highest quality"
Negative:
"jpeg artifacts, low quality, lowres, doll, plastic, blur"
30 Steps, DPM++ 2M Karras, 512 X 768.
Results are nothing particularly noteworthy thus far:
So I'm thinking perhaps more needs to be done to run SD XL correctly?
Using 7900 XTX, I'm seeing very low utilization of GPU resources. Image above took 74 seconds to generate. Webui-user startup arg's:
I'm thinking perhaps more needs to be done to upgrade for SD XL?
PS: Upgrade the WebUI repo & the Python requirements.txt in the venv in 1 step, just edit the following and save as a .bat. (I keep mine in same folder as webui-user.bat:
Beta Was this translation helpful? Give feedback.
All reactions