Skip to content

Conversation

@Auraithm
Copy link

Motivation

In DLLM (Disaggregated LLM) mode, tokens are generated in blocks and progressively unmasked in a non-sequential order. Currently, there is no way to track which decoding step each token was revealed in. This information is valuable for:

  • Analyzing DLLM decoding efficiency
  • Understanding the non-sequential token generation pattern
  • Debugging and optimizing unmasking strategies
  • Research on speculative decoding performance

This PR adds a step_map field to track the decoding step number for each generated token.

Modification

This PR introduces a step_map feature that records which step each token was decoded in DLLM mode:

  1. Core tracking logic (lmdeploy/pytorch/strategies/dllm/sequence.py):

    • Added history_step_map field to SchedulerSequenceDLLM to store step numbers
    • Added _current_step counter to track decoding steps
    • Added step_map and generated_step_map properties
    • Updated _update_token_ids_decode() to record step numbers when tokens transition from MASKED to UNMASKED
    • Step counter only increments when new tokens are actually unmasked
  2. Engine layer (lmdeploy/pytorch/engine/engine.py):

    • Added step_map field to InferOutput dataclass
    • Extract step_map from messages in _make_infer_outputs()
    • Propagate step_map through response data
  3. Instance layer (lmdeploy/pytorch/engine/engine_instance.py):

    • Extract and pass step_map to EngineOutput
  4. API layer (lmdeploy/messages.py):

    • Added step_map field to Response dataclass
    • Added step_map field to EngineOutput dataclass
    • Updated Response.__repr__() to display step_map
  5. Async engine layer (lmdeploy/serve/async_engine.py):

    • Added step_map field to GenOut dataclass
    • Updated _gen_out_to_response() to pass step_map
    • Updated _append_response() to accumulate step_map across iterations
    • Extract incremental step_map from engine outputs in generation loop

How it works:

  • Each token gets a step number indicating when it was unmasked (1, 2, 3, ...)
  • The step_map array has the same length as the generated tokens
  • Non-sequential order in step_map reflects DLLM's parallel decoding behavior

BC-breaking (Optional)

No breaking changes. This is a backward-compatible addition:

  • New optional step_map field defaults to None in all dataclasses
  • Existing code will continue to work without modification
  • Only DLLM mode populates step_map; other modes return None

Use cases (Optional)

Example usage:

from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig

# Configure DLLM
backend_config = PytorchEngineConfig(
    dllm_block_length=4,
    dllm_unmasking_strategy="low_confidence_dynamic",
)

with pipeline(model_path, backend_config=backend_config) as pipe:
    gen_config = GenerationConfig(max_new_tokens=100)
    outputs = pipe(["Hello"], gen_config=gen_config)
    
    for output in outputs:
        if output.step_map is not None:
            print(f"Tokens: {output.token_ids}")
            print(f"Step map: {output.step_map}")
            # Example: [1, 2, 1, 1, 3, 3, 3, 3, ...]
            # Shows non-sequential unmasking pattern

Analysis example:

from collections import Counter

# Analyze decoding efficiency
step_counts = Counter(output.step_map)
for step in sorted(step_counts.keys()):
    print(f"Step {step}: {step_counts[step]} tokens decoded")

This helps researchers:

  • Measure average tokens decoded per step
  • Evaluate unmasking strategy effectiveness
  • Compare different DLLM configurations

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

- Add step_map field to track which step each token was decoded
- step_map records the step number when tokens transition from MASKED to UNMASKED
- Propagate step_map through engine outputs to final Response
- Useful for analyzing DLLM decoding efficiency and token generation order
@Auraithm Auraithm mentioned this pull request Oct 21, 2025
3 tasks
@Auraithm
Copy link
Author

Auraithm commented Nov 4, 2025

Any new developments? We will soon release the relevant training framework, which requires the use of this branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants