-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Liger Fused Cross Entropy Kernel to FOAK Plugin #76
Comments
@achew010 currently we only install the non-chuncked CE loss from unsloth. It seems to be OK for llama3, but scaling up, we should consider handling chunking also |
Considerations for Introducing FusedCrossEntropyLoss to FMS-AccelerationLiger's FCELoss combines the LM head matmul with the CrossEntropyLoss kernel into a single operation We can keep the additional FusedCrossEntropyLoss code inside There are 3 approaches to apply FusedCrossEntropyLoss to FMS-Acceleration, the 1st 2 options have tradeoffs in terms of maintainability and reliability. 3rd option requires additional documentation as there are certain nuances that might not be easy to understand.
|
Implemented (2) in this PR #93 but as noted above it suffers from maintainability as
Worked on implementing (3) via PR https://github.com/foundation-model-stack/fms-acceleration/compare/main...anhuong:fms-acceleration:fused-cross-entropyloss?expand=1 however the model patching rules aren't quite right here. Thus it could be worth it to implement (2) via the PR for just transformers v4.44/4.43 and then implement the new solution with the model patching changes needed form #98. |
@fabianlim do you want to close this issue since we merged the liger PR or leave it open while we work towards the v2? |
Description
Consider adding additional FusedCrossEntropyLoss kernel to FOAK set of kernels given the additional improvement seen using it in earlier tests (See Background below).
Considerations:
Background
A comparison of the current FOAK kernels against the kernels from Liger using Liger's full FT benchmark script with the following parameters;
4 triton kernels are activated in the comparison against the FOAK equivalents,
The benchmarks report the following metrics
avg_tokens_per_sec
: Total input tokens seen by the model divided by the total runtime (secs) of each run.total_peak_allocated_memory
: Total peak allocated gpu memory in MBWe observe that the FOAK kernels matches Liger in both speed and memory consumption with all 4 kernels (using the unfused CrossEntropyLoss kernel) but Liger performs better with FusedCrossEntropyLoss for
Additional Notes
Extracted from fms-acceleration FOAK slides
The text was updated successfully, but these errors were encountered: