You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The goal is to identify and implement the most impactful optimizations for improving the performance of AI models, focusing on inference speed and efficient VRAM usage while also keeping an eye on the quality of the results.
Current Optimizations
The following optimizations are already integrated into our codebase:
Half Precision: Utilizing half-precision weights was supported to enhance inference speed and reduce memory consumption, implemented in ai-worker/image_to_video pipeline.
SFAST (xformers & Triton): Adopted from stable-fast, currently speeds up inference and may reduce memory usage in the future. See implementation in ai-worker/sfast pipeline.
Future Explorations
CPU Offloading:@Titan-Node is currently investigating the potential to decrease memory usage by (sequential) CPU offloading certain computations to the CPU, as described in CPU offloading optimization.
We're exploring various optimizations available in the Diffusers library to enhance VRAM usage and inference speed. @Titan-Node is currently benchmarking these optimizations, using his ai-benchmark wrapper, across his GPU pool and the Livepeer network to evaluate their effectiveness. Preliminary results are documented in this community spreadsheet via the ai-benchmarking wrapper.
Objective
The goal is to identify and implement the most impactful optimizations for improving the performance of AI models, focusing on inference speed and efficient VRAM usage while also keeping an eye on the quality of the results.
Current Optimizations
The following optimizations are already integrated into our codebase:
Future Explorations
Links and Resources
The text was updated successfully, but these errors were encountered: