Merge pull request #3728 from vladmandic/dev

merge dev to master
vladmandic · Jan 29, 2025 · 46464c4 · 46464c4
2 parents 586ef9a + b2df5e4
commit 46464c4
Show file tree

Hide file tree

Showing 99 changed files with 5,035 additions and 2,105 deletions.
diff --git a/.github/workflows/build_readme.yaml b/.github/workflows/build_readme.yaml
@@ -2,8 +2,6 @@ name: update-readme
 
 on:
   workflow_dispatch:
-  schedule:
-  - cron: '0 */4 * * *'
 
 jobs:
   deploy:

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,15 +1,111 @@
 # Change Log for SD.Next
 
-## Update for 2025-01-16
+## Highlights for 2025-01-29
 
-- **Gallery**:  
-  - add http fallback for slow/unreliable links  
-- **Fixes**:
+Two weeks since last release, time for update!  
+
+*What's New?*  
+- New **Detailer** functionality including ability to use several new  
+  face-restore models: *RestoreFormer, CodeFormer, GFPGan, GPEN-BFR*
+- Support for new models/pipelines:  
+  face-swapper with **Photomaker-v2** and video with **Fast-Hunyuan**  
+- Support for several new optimizations and accelerations:  
+  Many **IPEX** improvements, native *torch fp8* support,  
+  support for **PAB:Pyramid-attention-broadcast**, **ParaAttention** and **PerFlow**  
+- Fully built-in both model **merge weights** as well as model **merge component**  
+  Finally replace that pesky VAE in your favorite model with a fixed one!  
+- Improved remote access control and reliability as well as running inside containers  
+- And of course, hotfixes for all reported issues...  
+
+## Details for 2025-01-28
+
+- **Contributing**:  
+  - if you'd like to contribute, please see updated [contributing](https://github.com/vladmandic/automatic/blob/dev/CONTRIBUTING) guidelines
+- **Model Merge**
+  - replace model components and merge LoRAs  
+    in addition to existing model weights merge support  
+    now also having ability to replace model components and merge LoRAs  
+    you can also test merges in-memory without needing to save to disk at all  
+    and you can also use it to convert diffusers to safetensors if you want  
+    *example*: replace vae in your favorite model with a fixed one? replace text encoder? etc.  
+    *note*: limited to sdxl for now, additional models can be added depending on popularity  
+- **Detailer**:  
+  - in addition as standard behavior of detect & run-generate, it can now also run face-restore models  
+  - included models are: *CodeFormer, RestoreFormer, GFPGan, GPEN-BFR*  
+- **Face**:  
+  - new [PhotoMaker v2](https://huggingface.co/TencentARC/PhotoMaker-V2) and reimplemented [PhotoMaker v1](https://huggingface.co/TencentARC/PhotoMaker)  
+    compatible with sdxl models, generates pretty good results and its faster than most other methods  
+    select under *scripts -> face -> photomaker*  
+  - new [ReSwapper](https://github.com/somanchiu/ReSwapper)  
+    todo: experimental-only and unfinished, only noting in changelog for future reference  
+- **Video**  
+  - **hunyuan video** support for [FastHunyuan](https://huggingface.co/FastVideo/FastHunyuan)  
+    simply select model variant and set appropriate parameters  
+    recommended: sampler-shift=17, steps=6, resolution=720x1280, frames=125, guidance>6.0  
+- [PAB: Pyramid Attention Broadcast](https://oahzxl.github.io/PAB/)  
+  - speed up generation by caching attention results between steps  
+  - enable in *settings -> pipeline modifiers -> pab*  
+  - adjust settings as needed: wider timestep range means more acceleration, but higher accuracy drop  
+  - compatible with most `transformer` based models: e.g. flux.1, hunyuan-video, lyx-video, mochi, etc.
+- [ParaAttention](https://github.com/chengzeyi/ParaAttention)
+  - first-block caching that can significantly speed up generation by dynamically reusing partial outputs between steps  
+  - available for: flux, hunyuan-video, ltx-video, mochi  
+  - enable in *settings -> pipeline modifiers -> para-attention*  
+  - adjust residual diff threshold to balance the speedup and the accuracy:  
+    higher values leads to more cache hits and speedups, but might also lead to a higher accuracy drop  
+- **IPEX**
+  - enable force attention slicing, fp64 emulation, jit cache  
+  - use the us server by default on linux  
+  - use pytorch test branch on windows  
+  - extend the supported python versions  
+  - improve sdpa dynamic attention  
+- **Torch FP8**
+  - uses torch `float8_e4m3fn` or `float8_e5m2` as data storage and performs dynamic upcasting to compute `dtype` as needed  
+  - compatible with most `unet` and `transformer` based models: e.g. *sd15, sdxl, sd35, flux.1, hunyuan-video, ltx-video, etc.*  
+    this is alternative to `bnb`/`quanto`/`torchao` quantization on models/platforms/gpus where those libraries are not available  
+  - enable in *settings -> quantization -> layerwise casting*  
+- [PerFlow](https://github.com/magic-research/piecewise-rectified-flow)  
+  - piecewise rectified flow as model acceleration  
+  - use `perflow` scheduler combined with one of the available pre-trained [models](https://huggingface.co/hansyan)  
+- **Other**:  
+  - **upscale**: new [asymmetric vae](Heasterian/AsymmetricAutoencoderKLUpscaler) upscaling method
+  - **gallery**: add http fallback for slow/unreliable links  
+  - **splash**: add legacy mode indicator on splash screen  
+  - **network**: extract thumbnail from model metadata if present  
+  - **network**: setting value to disable use of reference models  
+- **Refactor**:  
+  - **upscale**: code refactor to unify latent, resize and model based upscalers  
+  - **loader**: ability to run in-memory models  
+  - **schedulers**: ability to create model-less schedulers  
+  - **quantization**: code refactor into dedicated module  
+  - **dynamic attention sdpa**: more correct implementation and new trigger rate control  
+- **Remote access**:  
+  - perform auth check on ui startup  
+  - unified standard and modern-ui authentication method & cleanup auth logging  
+  - detect & report local/external/public ip addresses if using `listen` mode  
+  - detect *docker* enforced limits instead of system limits if running in a container  
+  - warn if using public interface without authentication  
+- **Fixes**:  
   - non-full vae decode  
   - send-to image transfer  
   - sana vae tiling  
   - increase gallery timeouts  
   - update ui element ids  
+  - modernui use local font  
+  - unique font family registration  
+  - mochi video number of frames  
+  - mark large models that should offload  
+  - avoid repeated optimum-quanto installation  
+  - avoid reinstalling bnb if not cuda  
+  - image metadata civitai compatibility  
+  - xyz grid handle invalid values  
+  - omnigen pipeline handle float seeds  
+  - correct logging of docker status on logs, thanks @kmscode  
+  - fix omnigen  
+  - fix docker status reporting  
+  - vlm/vqa with moondream2  
+  - rocm do not override triton installation  
+  - port streaming model load to diffusers  
 
 ## Update for 2025-01-15
 

diff --git a/CONTRIBUTING b/CONTRIBUTING
@@ -4,17 +4,24 @@ Pull requests from everyone are welcome
 
 Procedure for contributing:
 
+- Select SD.Next `dev` branch:  
+  <https://github.com/vladmandic/automatic/tree/dev>  
 - Create a fork of the repository on github  
-  In a top right corner of a GitHub, select "Fork"
-  Its recommended to fork latest version from main branch to avoid any possible conflicting code updates
+  In a top right corner of a GitHub, select "Fork"  
+  Its recommended to fork latest version from main branch to avoid any possible conflicting code updates  
 - Clone your forked repository to your local system  
-  `git clone https://github.com/<your-username>/<your-fork>
+  `git clone https://github.com/<your-username>/<your-fork>`  
 - Make your changes  
-- Test your changes
-- Test your changes against code guidelines  
-  - `ruff check`
-  - `pylint <folder>/<filename>.py`
+- Test your changes  
+- Lint your changes against code guidelines  
+  - `ruff check`  
+  - `pylint <folder>/<filename>.py`  
 - Push changes to your fork  
-- Submit a PR (pull request)
+- Submit a PR (pull request)  
+  - Make sure that PR is against `dev` branch  
+  - Update your fork before createing PR so that it is based on latest code  
+  - Make sure that PR does NOT include any unrelated edits  
+  - Make sure that PR does not include changes to submodules  
 
-Your pull request will be reviewed and pending review results, merged into main branch
+Your pull request will be reviewed and pending review results, merged into `dev` branch  
+Dev merges to main are performed regularly and any PRs that are merged to `dev` will be included in the next main release  
diff --git a/README.md b/README.md
@@ -17,7 +17,7 @@
 
 - [Documentation](https://vladmandic.github.io/sdnext-docs/)
 - [SD.Next Features](#sdnext-features)
-- [Model support](#model-support) and [Specifications]()
+- [Model support](#model-support)
 - [Platform support](#platform-support)
 - [Getting started](#getting-started)
 
@@ -32,7 +32,7 @@ All individual features are not listed here, instead check [ChangeLog](CHANGELOG
  ▹ **Windows | Linux | MacOS | nVidia | AMD | IntelArc/IPEX | DirectML | OpenVINO | ONNX+Olive | ZLUDA**
 - Platform specific autodetection and tuning performed on install  
 - Optimized processing with latest `torch` developments with built-in support for model compile, quantize and compress  
-  Compile backends: *Triton | StableFast | DeepCache | OneDiff*  
+  Compile backends: *Triton | StableFast | DeepCache | OneDiff | TeaCache | etc.*  
   Quantization and compression methods: *BitsAndBytes | TorchAO | Optimum-Quanto | NNCF*  
 - Built-in queue management  
 - Built in installer with automatic updates and dependency management  
@@ -82,6 +82,11 @@ SD.Next supports broad range of models: [supported models](https://vladmandic.gi
 > [!WARNING]
 > If you run into issues, check out [troubleshooting](https://vladmandic.github.io/sdnext-docs/Troubleshooting/) and [debugging](https://vladmandic.github.io/sdnext-docs/Debug/) guides  
 
+### Contributing
+
+Please see [Contributing](CONTRIBUTING) for details on how to contribute to this project  
+And for any question, reach out on [Discord](https://discord.gg/VjvR2tabEX) or open an [issue](https://github.com/vladmandic/automatic/issues) or [discussion](https://github.com/vladmandic/automatic/discussions)  
+
 ### Credits
 
 - Main credit goes to [Automatic1111 WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) for the original codebase  
@@ -104,10 +109,4 @@ SD.Next supports broad range of models: [supported models](https://vladmandic.gi
 If you're unsure how to use a feature, best place to start is [Docs](https://vladmandic.github.io/sdnext-docs/) and if its not there,  
 check [ChangeLog](https://vladmandic.github.io/sdnext-docs/CHANGELOG/) for when feature was first introduced as it will always have a short note on how to use it  
 
-### Sponsors
-
-<div align="center">
-<!-- sponsors --><a href="https://github.com/allangrant"><img src="https://github.com/allangrant.png" width="60px" alt="Allan Grant" /></a><a href="https://github.com/mantzaris"><img src="https://github.com/mantzaris.png" width="60px" alt="a.v.mantzaris" /></a><a href="https://github.com/CurseWave"><img src="https://github.com/CurseWave.png" width="60px" alt="" /></a><a href="https://github.com/smlbiobot"><img src="https://github.com/smlbiobot.png" width="60px" alt="SML (See-ming Lee)" /></a><!-- sponsors -->
-</div>
-
 <br>
diff --git a/extensions-builtin/sd-extension-system-info b/extensions-builtin/sd-extension-system-info
diff --git a/extensions-builtin/sdnext-modernui b/extensions-builtin/sdnext-modernui
diff --git a/installer.py b/installer.py
@@ -447,7 +447,7 @@ def get_platform():
             'system': platform.system(),
             'release': release,
             'python': platform.python_version(),
-            'docker': os.environ.get('SD_INSTALL_DEBUG', None) is not None,
+            'docker': os.environ.get('SD_DOCKER', None) is not None,
             # 'host': platform.node(),
             # 'version': platform.version(),
         }
@@ -492,7 +492,7 @@ def check_diffusers():
     t_start = time.time()
     if args.skip_all or args.skip_git:
         return
-    sha = 'b785ddb654e4be3ae0066e231734754bdb2a191c' # diffusers commit hash
+    sha = '7b100ce589b917d4c116c9e61a6ec46d4f2ab062' # diffusers commit hash
     pkg = pkg_resources.working_set.by_key.get('diffusers', None)
     minor = int(pkg.version.split('.')[1] if pkg is not None else 0)
     cur = opts.get('diffusers_version', '') if minor > 0 else ''
@@ -625,6 +625,9 @@ def install_rocm_zluda():
         else:
             torch_command = os.environ.get('TORCH_COMMAND', f'torch torchvision --index-url https://download.pytorch.org/whl/rocm{rocm.version}')
 
+        if os.environ.get('TRITON_COMMAND', None) is None:
+            os.environ.setdefault('TRITON_COMMAND', 'skip') # pytorch auto installs pytorch-triton-rocm as a dependency instead
+
         if sys.version_info < (3, 11):
             ort_version = os.environ.get('ONNXRUNTIME_VERSION', None)
             if rocm.version is None or float(rocm.version) > 6.0:
@@ -659,22 +662,39 @@ def install_rocm_zluda():
 
 def install_ipex(torch_command):
     t_start = time.time()
-    check_python(supported_minors=[10,11], reason='IPEX backend requires Python 3.10 or 3.11')
+    # Python 3.12 will cause compatibility issues with other dependencies
+    # IPEX supports Python 3.12 so don't block it but don't advertise it in the error message
+    check_python(supported_minors=[9, 10, 11, 12], reason='IPEX backend requires Python 3.9, 3.10 or 3.11')
     args.use_ipex = True # pylint: disable=attribute-defined-outside-init
     log.info('IPEX: Intel OneAPI toolkit detected')
+
     if os.environ.get("NEOReadDebugKeys", None) is None:
         os.environ.setdefault('NEOReadDebugKeys', '1')
     if os.environ.get("ClDeviceGlobalMemSizeAvailablePercent", None) is None:
         os.environ.setdefault('ClDeviceGlobalMemSizeAvailablePercent', '100')
+    if os.environ.get("SYCL_CACHE_PERSISTENT", None) is None:
+        os.environ.setdefault('SYCL_CACHE_PERSISTENT', '1') # Jit cache
+
     if os.environ.get("PYTORCH_ENABLE_XPU_FALLBACK", None) is None:
-        os.environ.setdefault('PYTORCH_ENABLE_XPU_FALLBACK', '1')
+        os.environ.setdefault('PYTORCH_ENABLE_XPU_FALLBACK', '1') # CPU fallback for unsupported ops
+    if os.environ.get("OverrideDefaultFP64Settings", None) is None:
+        os.environ.setdefault('OverrideDefaultFP64Settings', '1')
+    if os.environ.get("IGC_EnableDPEmulation", None) is None:
+        os.environ.setdefault('IGC_EnableDPEmulation', '1') # FP64 Emulation
+    if os.environ.get('IPEX_FORCE_ATTENTION_SLICE', None) is None:
+        # XPU PyTorch doesn't support Flash Atten or Memory Atten yet so Battlemage goes OOM without this
+        os.environ.setdefault('IPEX_FORCE_ATTENTION_SLICE', '1')
+
     if "linux" in sys.platform:
-        torch_command = os.environ.get('TORCH_COMMAND', 'torch==2.5.1+cxx11.abi torchvision==0.20.1+cxx11.abi intel-extension-for-pytorch==2.5.10+xpu oneccl_bind_pt==2.5.0+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/')
-        # torch_command = os.environ.get('TORCH_COMMAND', 'torch torchvision --index-url https://download.pytorch.org/whl/test/xpu') # test wheels are stable previews, significantly slower than IPEX
-        # os.environ.setdefault('TENSORFLOW_PACKAGE', 'tensorflow==2.15.1 intel-extension-for-tensorflow[xpu]==2.15.0.1')
+        # default to US server. If The China server is needed, change .../release-whl/stable/xpu/us/ to .../release-whl/stable/xpu/cn/
+        torch_command = os.environ.get('TORCH_COMMAND', 'torch==2.5.1+cxx11.abi torchvision==0.20.1+cxx11.abi intel-extension-for-pytorch==2.5.10+xpu oneccl_bind_pt==2.5.0+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/')
+        if os.environ.get('TRITON_COMMAND', None) is None:
+            os.environ.setdefault('TRITON_COMMAND', '--pre pytorch-triton-xpu==3.1.0+91b14bf559 --index-url https://download.pytorch.org/whl/nightly/xpu')
+        # os.environ.setdefault('TENSORFLOW_PACKAGE', 'tensorflow==2.15.1 intel-extension-for-tensorflow[xpu]==2.15.0.2')
     else:
-        torch_command = os.environ.get('TORCH_COMMAND', '--pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/xpu') # torchvision doesn't exist on test/stable branch for windows
-    install(os.environ.get('OPENVINO_PACKAGE', 'openvino==2024.5.0'), 'openvino', ignore=True)
+        torch_command = os.environ.get('TORCH_COMMAND', 'torch==2.6.0+xpu torchvision==0.21.0+xpu --index-url https://download.pytorch.org/whl/test/xpu')
+
+    install(os.environ.get('OPENVINO_PACKAGE', 'openvino==2024.6.0'), 'openvino', ignore=True)
     install('nncf==2.7.0', ignore=True, no_deps=True) # requires older pandas
     install(os.environ.get('ONNXRUNTIME_PACKAGE', 'onnxruntime-openvino'), 'onnxruntime-openvino', ignore=True)
     ts('ipex', t_start)
@@ -683,6 +703,8 @@ def install_ipex(torch_command):
 
 def install_openvino(torch_command):
     t_start = time.time()
+    # Python 3.12 will cause compatibility issues with other dependencies.
+    # OpenVINO supports Python 3.12 so don't block it but don't advertise it in the error message
     check_python(supported_minors=[9, 10, 11, 12], reason='OpenVINO backend requires Python 3.9, 3.10 or 3.11')
     log.info('OpenVINO: selected')
     if sys.platform == 'darwin':
@@ -726,11 +748,22 @@ def install_torch_addons():
         install('optimum-quanto==0.2.6', 'optimum-quanto')
     if not args.experimental:
         uninstall('wandb', quiet=True)
-    if triton_command is not None:
+    if triton_command is not None and triton_command != 'skip':
         install(triton_command, 'triton', quiet=True)
     ts('addons', t_start)
 
 
+# check cudnn
+def check_cudnn():
+    import site
+    site_packages = site.getsitepackages()
+    cuda_path = os.environ.get('CUDA_PATH', '')
+    for site_package in site_packages:
+        folder = os.path.join(site_package, 'nvidia', 'cudnn', 'lib')
+        if os.path.exists(folder) and folder not in cuda_path:
+            os.environ['CUDA_PATH'] = f"{cuda_path}:{folder}"
+
+
 # check torch version
 def check_torch():
     t_start = time.time()
@@ -842,6 +875,7 @@ def check_torch():
         return
     if not args.skip_all:
         install_torch_addons()
+    check_cudnn()
     if args.profile:
         pr.disable()
         print_profile(pr, 'Torch')
@@ -1056,7 +1090,7 @@ def install_optional():
     install('gfpgan')
     install('clean-fid')
     install('pillow-jxl-plugin==1.3.1', ignore=True)
-    install('optimum-quanto=0.2.6', ignore=True)
+    install('optimum-quanto==0.2.6', ignore=True)
     install('bitsandbytes==0.45.0', ignore=True)
     install('pynvml', ignore=True)
     install('ultralytics==8.3.40', ignore=True)

diff --git a/javascript/base.css b/javascript/base.css
@@ -1,4 +1,4 @@
-@font-face { font-family: 'NotoSans'; font-display: swap; font-style: normal; font-weight: 100; src: local('NotoSans'), url('notosans-nerdfont-regular.ttf') }
+@font-face { font-family: 'NotoSans'; font-display: swap; font-style: normal; font-weight: 100; src: local('NotoSansNerd'), url('notosans-nerdfont-regular.ttf') }
 
 /* toolbutton */
 .gradio-button.tool { max-width: min-content; min-width: min-content !important; align-self: end; font-size: 1.4em; color: var(--body-text-color) !important; }
@@ -77,7 +77,7 @@ table.settings-value-table td { padding: 0.4em; border: 1px solid #ccc; max-widt
 #extensions .info { margin: 0; }
 #extensions .date { opacity: 0.85; font-size: 90%; }
 
-/* extra networks */
+/* networks */
 .extra-networks > div { margin: 0; border-bottom: none !important; }
 .extra-networks .second-line { display: flex; width: -moz-available; width: -webkit-fill-available; gap: 0.3em; box-shadow: var(--input-shadow); margin-bottom: 2px; }
 .extra-networks .search { flex: 1; }

diff --git a/javascript/black-gray.css b/javascript/black-gray.css
@@ -1,5 +1,5 @@
 /* generic html tags */
-@font-face { font-family: 'NotoSans'; font-display: swap; font-style: normal; font-weight: 100; src: local('NotoSans'), url('notosans-nerdfont-regular.ttf') }
+@font-face { font-family: 'NotoSans'; font-display: swap; font-style: normal; font-weight: 100; src: local('NotoSansNerd'), url('notosans-nerdfont-regular.ttf') }
 :root, .light, .dark {
   --font: 'NotoSans';
   --font-mono: 'ui-monospace', 'Consolas', monospace;
-Original file line number
+Diff line change
@@ Expand Up / @@ -2,8 +2,6 @@ name: update-readme @@
     on:
       workflow_dispatch:
-      schedule:
-      - cron: '0 */4 * * *'
     jobs:
       deploy:
@@ Expand Down @@
+8 −8		style.css
+2 −2		themes/Vlad-Default.css
+ −		themes/assets/notosans-nerdfont-regular.ttf