Skip to content

Commit

Permalink
Merge pull request #3728 from vladmandic/dev
Browse files Browse the repository at this point in the history
merge dev to master
  • Loading branch information
vladmandic authored Jan 29, 2025
2 parents 586ef9a + b2df5e4 commit 46464c4
Show file tree
Hide file tree
Showing 99 changed files with 5,035 additions and 2,105 deletions.
2 changes: 0 additions & 2 deletions .github/workflows/build_readme.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@ name: update-readme

on:
workflow_dispatch:
schedule:
- cron: '0 */4 * * *'

jobs:
deploy:
Expand Down
104 changes: 100 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,111 @@
# Change Log for SD.Next

## Update for 2025-01-16
## Highlights for 2025-01-29

- **Gallery**:
- add http fallback for slow/unreliable links
- **Fixes**:
Two weeks since last release, time for update!

*What's New?*
- New **Detailer** functionality including ability to use several new
face-restore models: *RestoreFormer, CodeFormer, GFPGan, GPEN-BFR*
- Support for new models/pipelines:
face-swapper with **Photomaker-v2** and video with **Fast-Hunyuan**
- Support for several new optimizations and accelerations:
Many **IPEX** improvements, native *torch fp8* support,
support for **PAB:Pyramid-attention-broadcast**, **ParaAttention** and **PerFlow**
- Fully built-in both model **merge weights** as well as model **merge component**
Finally replace that pesky VAE in your favorite model with a fixed one!
- Improved remote access control and reliability as well as running inside containers
- And of course, hotfixes for all reported issues...

## Details for 2025-01-28

- **Contributing**:
- if you'd like to contribute, please see updated [contributing](https://github.com/vladmandic/automatic/blob/dev/CONTRIBUTING) guidelines
- **Model Merge**
- replace model components and merge LoRAs
in addition to existing model weights merge support
now also having ability to replace model components and merge LoRAs
you can also test merges in-memory without needing to save to disk at all
and you can also use it to convert diffusers to safetensors if you want
*example*: replace vae in your favorite model with a fixed one? replace text encoder? etc.
*note*: limited to sdxl for now, additional models can be added depending on popularity
- **Detailer**:
- in addition as standard behavior of detect & run-generate, it can now also run face-restore models
- included models are: *CodeFormer, RestoreFormer, GFPGan, GPEN-BFR*
- **Face**:
- new [PhotoMaker v2](https://huggingface.co/TencentARC/PhotoMaker-V2) and reimplemented [PhotoMaker v1](https://huggingface.co/TencentARC/PhotoMaker)
compatible with sdxl models, generates pretty good results and its faster than most other methods
select under *scripts -> face -> photomaker*
- new [ReSwapper](https://github.com/somanchiu/ReSwapper)
todo: experimental-only and unfinished, only noting in changelog for future reference
- **Video**
- **hunyuan video** support for [FastHunyuan](https://huggingface.co/FastVideo/FastHunyuan)
simply select model variant and set appropriate parameters
recommended: sampler-shift=17, steps=6, resolution=720x1280, frames=125, guidance>6.0
- [PAB: Pyramid Attention Broadcast](https://oahzxl.github.io/PAB/)
- speed up generation by caching attention results between steps
- enable in *settings -> pipeline modifiers -> pab*
- adjust settings as needed: wider timestep range means more acceleration, but higher accuracy drop
- compatible with most `transformer` based models: e.g. flux.1, hunyuan-video, lyx-video, mochi, etc.
- [ParaAttention](https://github.com/chengzeyi/ParaAttention)
- first-block caching that can significantly speed up generation by dynamically reusing partial outputs between steps
- available for: flux, hunyuan-video, ltx-video, mochi
- enable in *settings -> pipeline modifiers -> para-attention*
- adjust residual diff threshold to balance the speedup and the accuracy:
higher values leads to more cache hits and speedups, but might also lead to a higher accuracy drop
- **IPEX**
- enable force attention slicing, fp64 emulation, jit cache
- use the us server by default on linux
- use pytorch test branch on windows
- extend the supported python versions
- improve sdpa dynamic attention
- **Torch FP8**
- uses torch `float8_e4m3fn` or `float8_e5m2` as data storage and performs dynamic upcasting to compute `dtype` as needed
- compatible with most `unet` and `transformer` based models: e.g. *sd15, sdxl, sd35, flux.1, hunyuan-video, ltx-video, etc.*
this is alternative to `bnb`/`quanto`/`torchao` quantization on models/platforms/gpus where those libraries are not available
- enable in *settings -> quantization -> layerwise casting*
- [PerFlow](https://github.com/magic-research/piecewise-rectified-flow)
- piecewise rectified flow as model acceleration
- use `perflow` scheduler combined with one of the available pre-trained [models](https://huggingface.co/hansyan)
- **Other**:
- **upscale**: new [asymmetric vae](Heasterian/AsymmetricAutoencoderKLUpscaler) upscaling method
- **gallery**: add http fallback for slow/unreliable links
- **splash**: add legacy mode indicator on splash screen
- **network**: extract thumbnail from model metadata if present
- **network**: setting value to disable use of reference models
- **Refactor**:
- **upscale**: code refactor to unify latent, resize and model based upscalers
- **loader**: ability to run in-memory models
- **schedulers**: ability to create model-less schedulers
- **quantization**: code refactor into dedicated module
- **dynamic attention sdpa**: more correct implementation and new trigger rate control
- **Remote access**:
- perform auth check on ui startup
- unified standard and modern-ui authentication method & cleanup auth logging
- detect & report local/external/public ip addresses if using `listen` mode
- detect *docker* enforced limits instead of system limits if running in a container
- warn if using public interface without authentication
- **Fixes**:
- non-full vae decode
- send-to image transfer
- sana vae tiling
- increase gallery timeouts
- update ui element ids
- modernui use local font
- unique font family registration
- mochi video number of frames
- mark large models that should offload
- avoid repeated optimum-quanto installation
- avoid reinstalling bnb if not cuda
- image metadata civitai compatibility
- xyz grid handle invalid values
- omnigen pipeline handle float seeds
- correct logging of docker status on logs, thanks @kmscode
- fix omnigen
- fix docker status reporting
- vlm/vqa with moondream2
- rocm do not override triton installation
- port streaming model load to diffusers

## Update for 2025-01-15

Expand Down
25 changes: 16 additions & 9 deletions CONTRIBUTING
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,24 @@ Pull requests from everyone are welcome

Procedure for contributing:

- Select SD.Next `dev` branch:
<https://github.com/vladmandic/automatic/tree/dev>
- Create a fork of the repository on github
In a top right corner of a GitHub, select "Fork"
Its recommended to fork latest version from main branch to avoid any possible conflicting code updates
In a top right corner of a GitHub, select "Fork"
Its recommended to fork latest version from main branch to avoid any possible conflicting code updates
- Clone your forked repository to your local system
`git clone https://github.com/<your-username>/<your-fork>
`git clone https://github.com/<your-username>/<your-fork>`
- Make your changes
- Test your changes
- Test your changes against code guidelines
- `ruff check`
- `pylint <folder>/<filename>.py`
- Test your changes
- Lint your changes against code guidelines
- `ruff check`
- `pylint <folder>/<filename>.py`
- Push changes to your fork
- Submit a PR (pull request)
- Submit a PR (pull request)
- Make sure that PR is against `dev` branch
- Update your fork before createing PR so that it is based on latest code
- Make sure that PR does NOT include any unrelated edits
- Make sure that PR does not include changes to submodules

Your pull request will be reviewed and pending review results, merged into main branch
Your pull request will be reviewed and pending review results, merged into `dev` branch
Dev merges to main are performed regularly and any PRs that are merged to `dev` will be included in the next main release
15 changes: 7 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

- [Documentation](https://vladmandic.github.io/sdnext-docs/)
- [SD.Next Features](#sdnext-features)
- [Model support](#model-support) and [Specifications]()
- [Model support](#model-support)
- [Platform support](#platform-support)
- [Getting started](#getting-started)

Expand All @@ -32,7 +32,7 @@ All individual features are not listed here, instead check [ChangeLog](CHANGELOG
**Windows | Linux | MacOS | nVidia | AMD | IntelArc/IPEX | DirectML | OpenVINO | ONNX+Olive | ZLUDA**
- Platform specific autodetection and tuning performed on install
- Optimized processing with latest `torch` developments with built-in support for model compile, quantize and compress
Compile backends: *Triton | StableFast | DeepCache | OneDiff*
Compile backends: *Triton | StableFast | DeepCache | OneDiff | TeaCache | etc.*
Quantization and compression methods: *BitsAndBytes | TorchAO | Optimum-Quanto | NNCF*
- Built-in queue management
- Built in installer with automatic updates and dependency management
Expand Down Expand Up @@ -82,6 +82,11 @@ SD.Next supports broad range of models: [supported models](https://vladmandic.gi
> [!WARNING]
> If you run into issues, check out [troubleshooting](https://vladmandic.github.io/sdnext-docs/Troubleshooting/) and [debugging](https://vladmandic.github.io/sdnext-docs/Debug/) guides
### Contributing

Please see [Contributing](CONTRIBUTING) for details on how to contribute to this project
And for any question, reach out on [Discord](https://discord.gg/VjvR2tabEX) or open an [issue](https://github.com/vladmandic/automatic/issues) or [discussion](https://github.com/vladmandic/automatic/discussions)

### Credits

- Main credit goes to [Automatic1111 WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) for the original codebase
Expand All @@ -104,10 +109,4 @@ SD.Next supports broad range of models: [supported models](https://vladmandic.gi
If you're unsure how to use a feature, best place to start is [Docs](https://vladmandic.github.io/sdnext-docs/) and if its not there,
check [ChangeLog](https://vladmandic.github.io/sdnext-docs/CHANGELOG/) for when feature was first introduced as it will always have a short note on how to use it

### Sponsors

<div align="center">
<!-- sponsors --><a href="https://github.com/allangrant"><img src="https://github.com/allangrant.png" width="60px" alt="Allan Grant" /></a><a href="https://github.com/mantzaris"><img src="https://github.com/mantzaris.png" width="60px" alt="a.v.mantzaris" /></a><a href="https://github.com/CurseWave"><img src="https://github.com/CurseWave.png" width="60px" alt="" /></a><a href="https://github.com/smlbiobot"><img src="https://github.com/smlbiobot.png" width="60px" alt="SML (See-ming Lee)" /></a><!-- sponsors -->
</div>

<br>
2 changes: 1 addition & 1 deletion extensions-builtin/sd-extension-system-info
56 changes: 45 additions & 11 deletions installer.py
Original file line number Diff line number Diff line change
Expand Up @@ -447,7 +447,7 @@ def get_platform():
'system': platform.system(),
'release': release,
'python': platform.python_version(),
'docker': os.environ.get('SD_INSTALL_DEBUG', None) is not None,
'docker': os.environ.get('SD_DOCKER', None) is not None,
# 'host': platform.node(),
# 'version': platform.version(),
}
Expand Down Expand Up @@ -492,7 +492,7 @@ def check_diffusers():
t_start = time.time()
if args.skip_all or args.skip_git:
return
sha = 'b785ddb654e4be3ae0066e231734754bdb2a191c' # diffusers commit hash
sha = '7b100ce589b917d4c116c9e61a6ec46d4f2ab062' # diffusers commit hash
pkg = pkg_resources.working_set.by_key.get('diffusers', None)
minor = int(pkg.version.split('.')[1] if pkg is not None else 0)
cur = opts.get('diffusers_version', '') if minor > 0 else ''
Expand Down Expand Up @@ -625,6 +625,9 @@ def install_rocm_zluda():
else:
torch_command = os.environ.get('TORCH_COMMAND', f'torch torchvision --index-url https://download.pytorch.org/whl/rocm{rocm.version}')

if os.environ.get('TRITON_COMMAND', None) is None:
os.environ.setdefault('TRITON_COMMAND', 'skip') # pytorch auto installs pytorch-triton-rocm as a dependency instead

if sys.version_info < (3, 11):
ort_version = os.environ.get('ONNXRUNTIME_VERSION', None)
if rocm.version is None or float(rocm.version) > 6.0:
Expand Down Expand Up @@ -659,22 +662,39 @@ def install_rocm_zluda():

def install_ipex(torch_command):
t_start = time.time()
check_python(supported_minors=[10,11], reason='IPEX backend requires Python 3.10 or 3.11')
# Python 3.12 will cause compatibility issues with other dependencies
# IPEX supports Python 3.12 so don't block it but don't advertise it in the error message
check_python(supported_minors=[9, 10, 11, 12], reason='IPEX backend requires Python 3.9, 3.10 or 3.11')
args.use_ipex = True # pylint: disable=attribute-defined-outside-init
log.info('IPEX: Intel OneAPI toolkit detected')

if os.environ.get("NEOReadDebugKeys", None) is None:
os.environ.setdefault('NEOReadDebugKeys', '1')
if os.environ.get("ClDeviceGlobalMemSizeAvailablePercent", None) is None:
os.environ.setdefault('ClDeviceGlobalMemSizeAvailablePercent', '100')
if os.environ.get("SYCL_CACHE_PERSISTENT", None) is None:
os.environ.setdefault('SYCL_CACHE_PERSISTENT', '1') # Jit cache

if os.environ.get("PYTORCH_ENABLE_XPU_FALLBACK", None) is None:
os.environ.setdefault('PYTORCH_ENABLE_XPU_FALLBACK', '1')
os.environ.setdefault('PYTORCH_ENABLE_XPU_FALLBACK', '1') # CPU fallback for unsupported ops
if os.environ.get("OverrideDefaultFP64Settings", None) is None:
os.environ.setdefault('OverrideDefaultFP64Settings', '1')
if os.environ.get("IGC_EnableDPEmulation", None) is None:
os.environ.setdefault('IGC_EnableDPEmulation', '1') # FP64 Emulation
if os.environ.get('IPEX_FORCE_ATTENTION_SLICE', None) is None:
# XPU PyTorch doesn't support Flash Atten or Memory Atten yet so Battlemage goes OOM without this
os.environ.setdefault('IPEX_FORCE_ATTENTION_SLICE', '1')

if "linux" in sys.platform:
torch_command = os.environ.get('TORCH_COMMAND', 'torch==2.5.1+cxx11.abi torchvision==0.20.1+cxx11.abi intel-extension-for-pytorch==2.5.10+xpu oneccl_bind_pt==2.5.0+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/')
# torch_command = os.environ.get('TORCH_COMMAND', 'torch torchvision --index-url https://download.pytorch.org/whl/test/xpu') # test wheels are stable previews, significantly slower than IPEX
# os.environ.setdefault('TENSORFLOW_PACKAGE', 'tensorflow==2.15.1 intel-extension-for-tensorflow[xpu]==2.15.0.1')
# default to US server. If The China server is needed, change .../release-whl/stable/xpu/us/ to .../release-whl/stable/xpu/cn/
torch_command = os.environ.get('TORCH_COMMAND', 'torch==2.5.1+cxx11.abi torchvision==0.20.1+cxx11.abi intel-extension-for-pytorch==2.5.10+xpu oneccl_bind_pt==2.5.0+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/')
if os.environ.get('TRITON_COMMAND', None) is None:
os.environ.setdefault('TRITON_COMMAND', '--pre pytorch-triton-xpu==3.1.0+91b14bf559 --index-url https://download.pytorch.org/whl/nightly/xpu')
# os.environ.setdefault('TENSORFLOW_PACKAGE', 'tensorflow==2.15.1 intel-extension-for-tensorflow[xpu]==2.15.0.2')
else:
torch_command = os.environ.get('TORCH_COMMAND', '--pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/xpu') # torchvision doesn't exist on test/stable branch for windows
install(os.environ.get('OPENVINO_PACKAGE', 'openvino==2024.5.0'), 'openvino', ignore=True)
torch_command = os.environ.get('TORCH_COMMAND', 'torch==2.6.0+xpu torchvision==0.21.0+xpu --index-url https://download.pytorch.org/whl/test/xpu')

install(os.environ.get('OPENVINO_PACKAGE', 'openvino==2024.6.0'), 'openvino', ignore=True)
install('nncf==2.7.0', ignore=True, no_deps=True) # requires older pandas
install(os.environ.get('ONNXRUNTIME_PACKAGE', 'onnxruntime-openvino'), 'onnxruntime-openvino', ignore=True)
ts('ipex', t_start)
Expand All @@ -683,6 +703,8 @@ def install_ipex(torch_command):

def install_openvino(torch_command):
t_start = time.time()
# Python 3.12 will cause compatibility issues with other dependencies.
# OpenVINO supports Python 3.12 so don't block it but don't advertise it in the error message
check_python(supported_minors=[9, 10, 11, 12], reason='OpenVINO backend requires Python 3.9, 3.10 or 3.11')
log.info('OpenVINO: selected')
if sys.platform == 'darwin':
Expand Down Expand Up @@ -726,11 +748,22 @@ def install_torch_addons():
install('optimum-quanto==0.2.6', 'optimum-quanto')
if not args.experimental:
uninstall('wandb', quiet=True)
if triton_command is not None:
if triton_command is not None and triton_command != 'skip':
install(triton_command, 'triton', quiet=True)
ts('addons', t_start)


# check cudnn
def check_cudnn():
import site
site_packages = site.getsitepackages()
cuda_path = os.environ.get('CUDA_PATH', '')
for site_package in site_packages:
folder = os.path.join(site_package, 'nvidia', 'cudnn', 'lib')
if os.path.exists(folder) and folder not in cuda_path:
os.environ['CUDA_PATH'] = f"{cuda_path}:{folder}"


# check torch version
def check_torch():
t_start = time.time()
Expand Down Expand Up @@ -842,6 +875,7 @@ def check_torch():
return
if not args.skip_all:
install_torch_addons()
check_cudnn()
if args.profile:
pr.disable()
print_profile(pr, 'Torch')
Expand Down Expand Up @@ -1056,7 +1090,7 @@ def install_optional():
install('gfpgan')
install('clean-fid')
install('pillow-jxl-plugin==1.3.1', ignore=True)
install('optimum-quanto=0.2.6', ignore=True)
install('optimum-quanto==0.2.6', ignore=True)
install('bitsandbytes==0.45.0', ignore=True)
install('pynvml', ignore=True)
install('ultralytics==8.3.40', ignore=True)
Expand Down
4 changes: 2 additions & 2 deletions javascript/base.css
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
@font-face { font-family: 'NotoSans'; font-display: swap; font-style: normal; font-weight: 100; src: local('NotoSans'), url('notosans-nerdfont-regular.ttf') }
@font-face { font-family: 'NotoSans'; font-display: swap; font-style: normal; font-weight: 100; src: local('NotoSansNerd'), url('notosans-nerdfont-regular.ttf') }

/* toolbutton */
.gradio-button.tool { max-width: min-content; min-width: min-content !important; align-self: end; font-size: 1.4em; color: var(--body-text-color) !important; }
Expand Down Expand Up @@ -77,7 +77,7 @@ table.settings-value-table td { padding: 0.4em; border: 1px solid #ccc; max-widt
#extensions .info { margin: 0; }
#extensions .date { opacity: 0.85; font-size: 90%; }

/* extra networks */
/* networks */
.extra-networks > div { margin: 0; border-bottom: none !important; }
.extra-networks .second-line { display: flex; width: -moz-available; width: -webkit-fill-available; gap: 0.3em; box-shadow: var(--input-shadow); margin-bottom: 2px; }
.extra-networks .search { flex: 1; }
Expand Down
2 changes: 1 addition & 1 deletion javascript/black-gray.css
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* generic html tags */
@font-face { font-family: 'NotoSans'; font-display: swap; font-style: normal; font-weight: 100; src: local('NotoSans'), url('notosans-nerdfont-regular.ttf') }
@font-face { font-family: 'NotoSans'; font-display: swap; font-style: normal; font-weight: 100; src: local('NotoSansNerd'), url('notosans-nerdfont-regular.ttf') }
:root, .light, .dark {
--font: 'NotoSans';
--font-mono: 'ui-monospace', 'Consolas', monospace;
Expand Down
Loading

0 comments on commit 46464c4

Please sign in to comment.