Skip to content

Conversation

@kyuyeunk
Copy link
Collaborator

@kyuyeunk kyuyeunk commented Nov 7, 2025

Description

Support bias and swiglu activation in MoE layer.

Additionally, this PR fixes an issue where program fails when attempting to load gpt-oss with expert parallelism.

Tests

VLLM_DISABLE_SHARED_EXPERTS_STREAM=1 MODEL_IMPL_TYPE=vllm vllm serve --model=unsloth/gpt-oss-120b-BF16  --max-model-len=8192 --max-num-batched-tokens 1024 --max-num-seqs=256 --no-enable-prefix-caching --disable-log-requests --gpu-memory-utilization 0.8 --tensor-parallel-size 8 --enable-expert-parallel

Checklist

Before submitting this PR, please make sure:

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have made or will make corresponding changes to any relevant documentation.

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

  • why is this change being made,
  • the problem being solved and any relevant context,
  • why this is a good solution,
  • some information about the specific implementation,
  • shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have made or will make corresponding changes to any relevant documentation.

@OhadRubin
Copy link

@kyuyeunk when can we expect kernel support and fully functional gpt-oss-120b inference?

@kyuyeunk
Copy link
Collaborator Author

kyuyeunk commented Nov 7, 2025

@kyuyeunk when can we expect kernel support and fully functional gpt-oss-120b inference?

For unquantized version, i would expect torchax path to fully support it by end of this week.

Uncertain about JAX path because I foresee heavily refactoring for not just making the kernel work, but also add additional optimizations to fully take advantage of it.

@kyuyeunk kyuyeunk force-pushed the load_moe_bias branch 2 times, most recently from 544b317 to e0e7ffd Compare November 8, 2025 10:20
@kyuyeunk kyuyeunk changed the title [Torchax] Add ability to load MoE bias [Torchax] Support bias and swiglu in MoE Nov 8, 2025
@kyuyeunk kyuyeunk force-pushed the load_moe_bias branch 3 times, most recently from 30e412a to 622d07a Compare November 9, 2025 01:51
@kyuyeunk kyuyeunk merged commit 6d0c11c into main Nov 10, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants