Changes LLMRayActor to use vllm's AsyncLLMEngine #1016

finbarrtimbers · 2025-09-16T14:24:46Z

By moving to the async engine, we have a much cleaner central processing loop, as we process each request in a dedicated function, relying on vllm to handle the batch processing.

SGLang and TensorRT-LLM both use async engines, so by moving to VLLM's async API, it will be much easier to test out those engines.

Runs with inflight_updates=True:

Single GPU debug run: Beaker (??% faster: 0m -> 1m).
Single GPU debug run w/ tool use: Beaker (1.69x faster: 27m -> 16m).
Multi-node test script: Beaker (1.9x faster: 19m -> 10m).

And with inflight_updates=False:

Single GPU: Beaker
Multi-node: Beaker

…tures are complete.

…quest_outputs.

…equest.

…list[CompletionOutput].

…ration The issue was that after a tool call, we would loop back and try to generate again with the same sub_request_id. This caused vLLM to reject or hang on the duplicate request ID. Now we append '_iterN' to create unique IDs for each generation attempt within the same request.

…erty

finbarrtimbers · 2025-10-07T19:44:45Z

open_instruct/grpo_fast.py

                        refs = [
                            engine.update_weight.remote(
-                                name, dtype=param.dtype, shape=shape, empty_cache=count == num_params
+                                name, dtype=str(param.dtype), shape=shape, empty_cache=count == num_params


This is needed to support the async engine's serialization, which for some unknown reason, goes through MessagePack, and not pickle.

finbarrtimbers · 2025-10-07T19:45:12Z

open_instruct/vllm_utils_workerwrap.py


-        assert dtype == self.model_config.dtype, f"mismatch dtype: src {dtype}, dst {self.model_config.dtype}"
-        weight = torch.empty(shape, dtype=dtype, device="cuda")
+        assert dtype == str(self.model_config.dtype), (


As discussed earlier, this is done because of weird serialization format in vLLM's async engine.

finbarrtimbers added 30 commits September 11, 2025 08:05

Now, processing persists across calls to process_from_queue

2e4adbf

Added local tool script

7c8046d

Updated code

96dd3ed

Lots of changes

4902547

Fixed exit condition

94cca19

Cleaned up code

aca6420

Added a fix to make sure we don't process a request until all tool fu…

e4b607d

…tures are complete.

Fixed bug where we were accidentally adding unfinished requests to re…

c94de62

…quest_outputs.

Updated script so we can't launch experiments with a dirty repo.

7f2ecb4

fixed bug

65c4746

mdofied script

42d6b4c

Attempt at making the flow correct

6333ab9

Added additional tracking.

7da1333

Added more logging

17185bc

Fixed bug that was incorrectly considering a request complete.

a087643

Checks for active vllm continuations

9dd6c77

Fixed sub-requests

3a1dfa4

Added additional assertions to track bug down.

5f062da

Updated code to add validation

962e68e

Added more validation.

0a16931

Fixed error in validation

6a6f0f6

Added more validation.

bef44d5

Fixed error in validation.

c2801c6

Added more validation.

abc18f5

Fixed validations.

a2be83d

updated validation.

331b2ae

Fixed a bug with processing. Added a function to handle finishing a r…

6f3ee34

…equest.

Moved where we call validation to be in a more appropriate place.

5e3ea7b

Fixed validation to be more comprehensive.

8e438c9

Fixed bug where we were passing a RequestOutput object rather than a …

a5b70d3

…list[CompletionOutput].

finbarrtimbers added 24 commits October 6, 2025 15:08

Add detailed logging to track tool execution and triggering

1d42889

Add detailed logging to trace async generation hang

bf6728a

Add detailed logging after tool execution to trace iteration hang

d1f0a50

Add fine-grained logging to debug model config access hang

b599b15

Add detailed logging to debug prompt concatenation hang

08db870

Cache prompt_token_ids to avoid hang when accessing TokensPrompt prop…

a59d9f0

…erty

Fix undefined variable in assert_threaded_actor

3675f06

Updated code

9a9a5c0

Set inflight false

cafc0d2

Fixed duplicate flag

49724e8

Now, uses async methods

b72762f

Implemented lazy init

19524cd

Cleaned up init

d82823a

Cleaned up init

37bdb05

more init fixes

492bcf0

Another attempt at fixing

e1a56bb

Another attempt at fixing

d58b75a

removed logging

dfc359f

fixed timeout

4eecb4f

Another fix

9d702ac

Adds a wrap_future

62c1498

Fixed code

ee1b35c

Another fix

103c0b9

finbarrtimbers commented Oct 7, 2025

View reviewed changes

finbarrtimbers added 4 commits October 7, 2025 14:00

updated code

61d9460

Added code

da265c7

Another attempt at fixing

ba135ac

Added logging

7bbcacc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changes LLMRayActor to use vllm's AsyncLLMEngine #1016

Changes LLMRayActor to use vllm's AsyncLLMEngine #1016

Uh oh!

finbarrtimbers commented Sep 16, 2025 •

edited

Loading

Uh oh!

finbarrtimbers Oct 7, 2025

Uh oh!

finbarrtimbers Oct 7, 2025

Uh oh!

Uh oh!

Changes LLMRayActor to use vllm's AsyncLLMEngine #1016

Are you sure you want to change the base?

Changes LLMRayActor to use vllm's AsyncLLMEngine #1016

Uh oh!

Conversation

finbarrtimbers commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

finbarrtimbers Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

finbarrtimbers Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

finbarrtimbers commented Sep 16, 2025 •

edited

Loading