Add step_map to track token decoding order in DLLM #4057

Auraithm · 2025-10-21T12:11:38Z

Motivation

In DLLM (Disaggregated LLM) mode, tokens are generated in blocks and progressively unmasked in a non-sequential order. Currently, there is no way to track which decoding step each token was revealed in. This information is valuable for:

Analyzing DLLM decoding efficiency
Understanding the non-sequential token generation pattern
Debugging and optimizing unmasking strategies
Research on speculative decoding performance

This PR adds a step_map field to track the decoding step number for each generated token.

Modification

This PR introduces a step_map feature that records which step each token was decoded in DLLM mode:

Core tracking logic (lmdeploy/pytorch/strategies/dllm/sequence.py):
- Added history_step_map field to SchedulerSequenceDLLM to store step numbers
- Added _current_step counter to track decoding steps
- Added step_map and generated_step_map properties
- Updated _update_token_ids_decode() to record step numbers when tokens transition from MASKED to UNMASKED
- Step counter only increments when new tokens are actually unmasked
Engine layer (lmdeploy/pytorch/engine/engine.py):
- Added step_map field to InferOutput dataclass
- Extract step_map from messages in _make_infer_outputs()
- Propagate step_map through response data
Instance layer (lmdeploy/pytorch/engine/engine_instance.py):
- Extract and pass step_map to EngineOutput
API layer (lmdeploy/messages.py):
- Added step_map field to Response dataclass
- Added step_map field to EngineOutput dataclass
- Updated Response.__repr__() to display step_map
Async engine layer (lmdeploy/serve/async_engine.py):
- Added step_map field to GenOut dataclass
- Updated _gen_out_to_response() to pass step_map
- Updated _append_response() to accumulate step_map across iterations
- Extract incremental step_map from engine outputs in generation loop

How it works:

Each token gets a step number indicating when it was unmasked (1, 2, 3, ...)
The step_map array has the same length as the generated tokens
Non-sequential order in step_map reflects DLLM's parallel decoding behavior

BC-breaking (Optional)

No breaking changes. This is a backward-compatible addition:

New optional step_map field defaults to None in all dataclasses
Existing code will continue to work without modification
Only DLLM mode populates step_map; other modes return None

Use cases (Optional)

Example usage:

from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig

# Configure DLLM
backend_config = PytorchEngineConfig(
    dllm_block_length=4,
    dllm_unmasking_strategy="low_confidence_dynamic",
)

with pipeline(model_path, backend_config=backend_config) as pipe:
    gen_config = GenerationConfig(max_new_tokens=100)
    outputs = pipe(["Hello"], gen_config=gen_config)
    
    for output in outputs:
        if output.step_map is not None:
            print(f"Tokens: {output.token_ids}")
            print(f"Step map: {output.step_map}")
            # Example: [1, 2, 1, 1, 3, 3, 3, 3, ...]
            # Shows non-sequential unmasking pattern

Analysis example:

from collections import Counter

# Analyze decoding efficiency
step_counts = Counter(output.step_map)
for step in sorted(step_counts.keys()):
    print(f"Step {step}: {step_counts[step]} tokens decoded")

This helps researchers:

Measure average tokens decoded per step
Evaluate unmasking strategy effectiveness
Compare different DLLM configurations

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
The documentation has been modified accordingly, like docstring or example tutorials.

- Add step_map field to track which step each token was decoded - step_map records the step number when tokens transition from MASKED to UNMASKED - Propagate step_map through engine outputs to final Response - Useful for analyzing DLLM decoding efficiency and token generation order

lmdeploy/messages.py

lmdeploy/pytorch/engine/engine.py

lmdeploy/serve/async_engine.py

Auraithm · 2025-11-04T05:18:15Z

Any new developments? We will soon release the relevant training framework, which requires the use of this branch

Auraithm mentioned this pull request Oct 21, 2025

[Bug] SDAR模型推理 #4052

Open

3 tasks

lvhan028 reviewed Oct 21, 2025

View reviewed changes

lmdeploy/messages.py Outdated Show resolved Hide resolved

lvhan028 reviewed Oct 21, 2025

View reviewed changes

lmdeploy/pytorch/engine/engine.py Show resolved Hide resolved

lvhan028 reviewed Oct 21, 2025

View reviewed changes

lmdeploy/serve/async_engine.py Show resolved Hide resolved

Auraithm added 3 commits October 21, 2025 14:11

update for small bugs

bb61c53

fixed

a15e396

Merge branch 'main' into dllm

9764865

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add step_map to track token decoding order in DLLM #4057

Add step_map to track token decoding order in DLLM #4057

Auraithm commented Oct 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Auraithm commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add step_map to track token decoding order in DLLM #4057

Are you sure you want to change the base?

Add step_map to track token decoding order in DLLM #4057

Conversation

Auraithm commented Oct 21, 2025

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Auraithm commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants