Session API #789

pblazej · 2025-09-18T12:32:15Z

Adds 3 basic building blocks for simple(r) agent experiences:

Session - connection, pre-connect, agent dispatch, agent filtering (e.g. by name), all agents, messages (broadcasted and aggregated for now)
Agent - wrapper around Participant, knows its tracks and internal state
LocalMedia - (unrelated) helper to deal with local tracks in SwiftUI

Example: livekit-examples/agent-starter-swift#29

Sources/LiveKit/Agent/Chat/Receive/TranscriptionStreamReceiver.swift

Sources/LiveKit/Agent/Session.swift

pblazej · 2025-09-18T12:59:56Z

Sources/LiveKit/Agent/Conversation.swift

+
+    // MARK: - Init
+
+    public init(credentials: CredentialsProvider, room: Room = .init(), agentName: String? = nil, senders: [any MessageSender]? = nil, receivers: [any MessageReceiver]? = nil) {


@1egoman @lukasIO I think that's the discussion about the logic:

agentName should take part in the direct dispatch

we'll introduce plural case later while keeping an internal array? it creates some confusion why the conversation cannot happen "with multiple agents"

wait for agents can only happen at the conversation level (as the Agent will be published when joining)

I believe we should check the names vs who actually joined

name shouldn't take part in the filtering? so that I'll keep an agent that I formally did not pass?

we'll introduce plural case later while keeping an internal array? it creates some confusion why the conversation cannot happen "with multiple agents"

This was generally what I had in mind yes, start with a singular agentName-type parameter, and then in the future add an internal array which can be backed by a new agentNames parameter (In swift I'd think it could probably be a new overload? On web, the way to accomplish the same thing would be that the parameter would now accept either a string or Array<string>, and renamed since parameter names aren't part of the external interface in js)

This link may be useful and represents the initial state of this on the web: https://github.com/livekit/components-js/pull/1207/files#diff-c2401cb9c778162d5def12d137a663f03477b02c804d9372846c318e903df77bR125-R146

wait for agents can only happen at the conversation level (as the Agent will be published when joining)

On the web right now, what I'm doing is in useConversation I'm calling useAgent to get access to the current agent, and then on the agent I exposed a method called waitForAvailable which returns a promise which resolves once agent.isAvailable is true. await agent.waitForAvailable() is then being called in conversation.start().

Associated code here: https://github.com/livekit/components-js/pull/1207/files#diff-c2401cb9c778162d5def12d137a663f03477b02c804d9372846c318e903df77bR306-R346

In the future, what I had been thinking (and another use-case for conversation) is it could maintain a registry of all currently connected agents. I started going down the road of building that in javascript sort of with useAgentTimeoutStore here (internal hook / not exposed). Right now it is largely responsible for agent timeout management but I could see that growing, storing data across multiple agents, and moving underneath conversation in the future.

@1egoman I think like we're still scratching the surface with "absence" 🥲

My high-level approach (not implemented yet) would be:

Conversation controls the (global) timeout (as it does the dispatch), agent does not know about its own timeout, etc. 🟢

Currently, passing one agentName is a little misleading for people having multiple agents anyway (without direct dispatch)...

When agentName is passed:

we do wait for this particular agent to join

should this wait be awaitable or background

When multiple agentNames are passed:

do we wait for all?

When no agentName is passed:

we do wait for any agent 🟠

we do not wait at all?

Agent is added to the registry when it joins the room (agent lifecycle == participant lifecycle), regardless of its (conversational)state 🟠 theoretically we can add them with "listening" | "thinking" | "speaking";, maybe including "idle" but it also limits the space of states (if we don't register them - with the guarantee that they won't disappear after turning idle) or is a little confusing (if we register them with another concept of "availability"). IMO, consumers should learn what AgentState means for their use cases, it's hard to tell what "available" means universally.

I'm not sure if doing that await agent.waitUntilAvailable(signal); is a great idea - should it be awaitable if you still wanna present some placeholder UI while agent joins/dispatches; this await can be modeled by agent's optionality Agent?/its tracks anyway 🟠

How to model iCanSpeak state - which is crucial for the UI:

During pre-connect: capturingAudio + .disconnected, .connecting, .reconnecting, .connected

After pre-connect: probably connectionState (room) is not enough, but how to handle e.g. handoff - I'd rather interpret that as "I'm in the room and someone is listening" rather than "some agent X is listening" as there may be gaps, etc.

The key difference is probably in how we think about "presence" - shall we make some arbitrary decisions here or no?

bcherry

this generally looks good - lmk when the API is considered final

Sources/LiveKit/Agent/Session.swift

github-actions · 2025-10-15T09:01:47Z

⚠️ This PR does not contain any files in the .changes directory.

Sources/LiveKit/Agent/Session.swift

pblazej · 2025-10-16T11:44:48Z

JS diff

🟢 - good enough
🟠 - needs discussion
🔴 - complete mess

Session Lifecycle 🟢

Swift Session.start: Handles token fetch, optional pre-connect audio, and Room.connect, exposing readiness via isReady/isListening that mirror Room state.
JS useSession.start: Coordinates Room.connect through hooks, waits for transport plus useAgent.waitUntilAvailable, and surfaces async helpers (waitUntilConnected, waitUntilDisconnected) instead of stored readiness.
Connection Tracking: Swift mirrors Room.connectionState; JS recomputes derived booleans each render and broadcasts via an internal EventEmitter.
Room Ownership: Swift Session owns its Room; JS can take a provided/context Room or lazily create one.

Agent Representation 🟠

Multiplicity: Swift keeps a dictionary of Agent instances keyed by Participant.Identity; JS resolves a single agentParticipant plus optional worker. 🔴
State Propagation: Swift Agent subscribes to Participant.changes and publishes AgentState, audio, and avatar tracks via @Published; JS derives state by combining participant attributes, local microphone state, and timeout info to yield fields like isAvailable and isBufferingSpeech.
Access Pattern: Swift exposes agent(named:), subscript access, and environment wrappers (LiveKitAgent); JS returns a memoized object with helpers (waitUntilCamera, waitUntilMicrophone) backed by React context.

Messaging Pipeline 🟢

Swift Protocol Layer: Defines MessageSender/MessageReceiver with AsyncStream, defaulting to TextMessageSender loopback and TranscriptionStreamReceiver aggregating into an OrderedDictionary of ReceivedMessage.
JS Hook Composition: useSessionMessages merges useChat and useTranscriptions, sorts merged ReceivedMessage arrays, and emits via an EventEmitter.
Send Path: Swift Session.send multiplexes through configured senders and returns a SentMessage; JS delegates to useChat.send, returning a ReceivedChatMessage without centralized session error handling.
History: Swift exposes getMessageHistory/restoreMessageHistory; JS recomputes combined arrays each render and leaves persistence to consumers.

Timeout & Errors 🟠

Agent Arrival: Swift schedules waitForAgentTask after connect to set .agentNotConnected; JS uses useAgentTimeoutIdStore to surface failureReasons when an agent never joins or initializes. 🔴
Error Surface: Swift centralizes errors via Session.Error (.failedToConnect, .failedToSend); JS distributes through rejected promises, SessionEvent emitters, and agent state fields.
Reset: Swift offers resetError(); JS relies on hook reinitialization and timeout resets.

Media & Local Resources 🟢

Swift LocalMedia: Dedicated ObservableObject managing device toggles, selection, and LiveKitLocalMedia wrapper tied to the session Room.
JS View: useSession includes local media references (cameraTrack, microphoneTrack) via hooks like useLocalParticipant, avoiding a standalone media manager.
Pre-connect: Swift optionally wraps connection with room.withPreConnectAudio to manage isListening; JS infers buffering from a local microphone publication via isBufferingSpeech.

State Exposure & Reactivity 🟢

Swift Model: Uses ObservableObject and @Published properties consumed through SwiftUI wrappers (LiveKitSession, LiveKitAgent).
JS Integration: Returns plain objects optimized for React renders, with internal EventEmitters supporting async waits and updates.
Readiness Signals: Swift exposes convenience flags (isReady, hasAgents); JS relies on waitable helpers (waitUntilAvailable) and derived flags (isAvailable, failureReasons).

pblazej · 2025-10-16T13:25:41Z

I haven't included most of the comments - until the API is approved 🟢

1egoman

Did another pass through and left a few notes, but largely looks like things are aligned on web + swift which is great!

Sources/LiveKit/Agent/Session.swift

1egoman · 2025-10-16T14:10:10Z

Sources/LiveKit/Agent/Session.swift

+        defer {
+            waitForAgentTask = Task { [weak self] in
+                try await Task.sleep(nanoseconds: UInt64(timeout * Double(NSEC_PER_SEC)))
+                try Task.checkCancellation()
+                guard let self else { return }
+                if connectionState == .connected, agents.isEmpty {
+                    self.error = .agentNotConnected
+                }
+            }
+        }


thought: Just wanted to surface another thing you had mentioned in your comment here - I think you missed that the web implementation not only stores a list of failureReasons but ALSO transitions to a new failed state. That state transition IMO is the more important behavior.

issue: Reading through this, it looks like it doesn't handle the case of multiple agents joining properly because it is one global task, even though your agents data structure does handle multiple agents properly.

For example, I'm thinking of this scenario:

Agent a is dispatched

Wait 10 seconds

Agent b is dispatched

Wait 10 more seconds

At this point, both agents are still in a non timed out state even though a hasn't connected after 20 seconds (the default timeout).

I think in practice this case won't ever be an issue right now because the external Session interface won't let you do this, but IMO this bug shouldn't be left lurking when the plan is to allow behavior like this later, and I'd think it wouldn't be that hard to fix (start a timeout task for each agent independently).

Or alternatively, maybe it's worth just storing one agent internally for now like I mentioned in another comment, in which case what you are doing here I think works fine.

suggestion: I think I prefer something closer to what I did on the web slightly to what you did here - what do you think about this:

Add a failure case to AgentState here.

Apply this state to any agent that doesn't join after the specified timeout in this above logic, along with maybe some sort of more detailed error into on the agent (maybe continue the self.error pattern on Agent as well?)

I think it's fine to still store a value on session's self.error, as long as it's clear to implementers that the error means that "at least one agent didn't connect" and not "all agents didn't connect".

I agree, I subconsciously avoided adding more states 😄

Agent in SwiftUI world must be optional anyway, as it's injected via the magical Environment object, and we cannot just crash when it's not there

introducing the error state that you must handle (from the enum) just for the sake of the error scenario - maybe that's an overkill?

IMO the core difference is I think more in terms of Agent == Participant - it's there where it's there, while you do more of an AgentPromise kind of thing that may resolve with an error

My biggest argument here is still introducing more states that we need to handle sort of "manually" (are not just observations of the engine) may lead to awkward behaviors (like with the preconnect API when you need to "superimpose" your local state over the connection state).

I'll try to revisit the JS code, maybe find some consensus.

cc @lukasIO

introducing the error state that you must handle (from the enum) just for the sake of the error scenario - maybe that's an overkill?

thought: My opinion - I see it as less overkill and more that without it, the agent state value isn't properly representing all the options, so the Agent ends up being a "leaky" state machine.

IMO the core difference is I think more in terms of Agent == Participant - it's there where it's there, while you do more of an AgentPromise kind of thing that may resolve with an error

thought: Yea you are right, on the web returning a "placeholder" agent response is important for a js-related reason (preserving the ability to destructure the useAgent return). One other nice thing it unlocks is it means that if an end user queries an agent and gets back an empty value, they don't have to guess exactly what that "lack of value" means - the agent could be in the midst of dispatching, it could have not connected, etc. So with this approach you have to query somewhere else to get that state, and at least on the web, that state being "disconnected" from the agent object results in some ergonomic challenges (type narrowing via discriminated unions won't work properly).

My biggest argument here is still introducing more states that we need to handle sort of "manually" (are not just observations of the engine) may lead to awkward behaviors (like with the preconnect API when you need to "superimpose" your local state over the connection state).

question: I'm not exactly sure I understand the preconnect API nuance you are describing, can you ellaborate further?

thought: Reading this though, you are making me think that maybe your concerns would be alleviated by splitting up the AgentState enum into two levels. Then you are never replacing a state value from lk.agent.state with an "internal" state value. So something like:

type AgentState = | { state: "connecting" } | { state: "connected", agent: { state: 'initializing' | 'idle' | 'listening' | 'thinking' | 'speaking', /* todo: add other agent related metadata in here */ } | { state: "failed", failureReasons: Array<string>, // Or maybe something else, just some way to capture the "why" behind the failure }

(sorry for the typescript and not swift, I'm hoping that gets the point across!)

@lukasIO so let's adopt web's additional state?

As you see from above, my main point is if we need to introduce additional case, means that AgentState is "malformed" internally (does not represent assumptions).

@lukasIO @1egoman I started refactoring in this direction, the part I find awkward to port is:
https://github.com/livekit/components-js/blob/f118da6e678c4a91be91c4dfc9b3b61eb7f64e2a/packages/react/src/hooks/useAgent.ts#L394-L425

The obstacles:

it requires bi-directional Room observation

if (roomConnectionState !== ConnectionState.Disconnected) { state = 'connecting'; }

I'm unable to calculate preconnect state in the agent itself (as the local track is not associated with local participant at the time)

if (localMicTrack) { state = 'listening'; bufferingSpeachLocally = true; }

I could move the failed state to the agent, but:

to represent .connecting (and start the timeout clock) I need speculative Agent creation (not via simple unidirectional Room observation as it is right now)

Finally, the core difference is:

in JS Agent is a "computed state"

in Swift Agent is an object with its own lifecycle

why? I started with a similar approach, more like "value semantics", but having a class instead allows us to track local state (without the fear of mutating the copy), inject Agent separately, etc.

In other words, it would be easier now to move the whole state back to Session that would own agentState - making multiple agents more awkward.

OR

Publish separate Agent(s) where all the state is computed in the Session itself.

Hope it makes sense (?)

So, coming back to square one:

waitForAgentTask = Task { [weak self] in try await Task.sleep(nanoseconds: UInt64(timeout * Double(NSEC_PER_SEC))) try Task.checkCancellation() guard let self else { return } if state == .connected, agents.isEmpty { self.error = .agentNotConnected // I cannot transition the agent to failed state as agents.isEmpty by definition } }

The closest alternative would be to move Session to a .failed state, but that's really just another representation of Error, just more awkward to consume on the frontend.

it requires bi-directional Room observation

Maybe I'm misunderstanding "bi-directional" here, but the session itself should not depend on the agent's state.
A session's connection state (and connection) are separate from whether or not an agent is connected.

@lukasIO I mean the other way around - agent depends on session (room indeed): https://github.com/livekit/components-js/blob/f118da6e678c4a91be91c4dfc9b3b61eb7f64e2a/packages/react/src/hooks/useAgent.ts#L402

in Swift these are separate entities as mentioned above, agent is not a computed property over session - so must update its local state, thus the word "bidirectional".

Sources/LiveKit/Agent/Session.swift

pblazej · 2025-10-22T12:08:14Z

@1egoman @lukasIO in the spirit of "talk is cheap", here's an alternative design, with more "stupid" Agent, separate states, etc.

a93fcf0

The key part is:

public enum Agent {
    case disconnected
    case connecting
    case connected(AgentState, (any AudioTrack)?, (any VideoTrack)?)
    case failed(Error)
}

I think the benefits have been justified earlier, possible downsides:

no local mutable state in the agent - that may be a limitation
- why not? because it would require either the knowledge of Room/Participant as it used to be or foreign mutation (from the Session) which is equally awkward
the necessity to use "prism" aka computed property to pull stuff from this shiny pure enum impl
the state machine that mutates the agent "state" isn't fully safe
- what to do with multiple agents then? e.g. create them all with corresponding names as .connecting then resolve?
the observation in SwiftUI isn't that granular (more view updates than really needed)

pblazej force-pushed the blaze/agent-conversation branch from 6ea1621 to 2f9bbee Compare September 18, 2025 12:38

pblazej commented Sep 18, 2025

View reviewed changes

Sources/LiveKit/Agent/Chat/Receive/TranscriptionStreamReceiver.swift Show resolved Hide resolved

pblazej commented Sep 18, 2025

View reviewed changes

Sources/LiveKit/Agent/Session.swift Outdated Show resolved Hide resolved

pblazej commented Sep 18, 2025

View reviewed changes

pblazej requested review from 1egoman, davidliu, hiroshihorie and lukasIO September 18, 2025 13:08

pblazej force-pushed the blaze/agent-conversation branch 2 times, most recently from 94ec7d0 to e5caee2 Compare September 18, 2025 13:34

bcherry reviewed Sep 22, 2025

View reviewed changes

Sources/LiveKit/Agent/Session.swift Outdated Show resolved Hide resolved

pblazej force-pushed the blaze/agent-conversation branch from aa93417 to 212035c Compare September 23, 2025 12:02

pblazej marked this pull request as draft October 1, 2025 08:40

pblazej force-pushed the blaze/connection-provider branch from 51915ab to 0c89008 Compare October 2, 2025 08:10

Base automatically changed from blaze/connection-provider to main October 14, 2025 12:29

pblazej force-pushed the blaze/agent-conversation branch from c52f944 to 9b16217 Compare October 15, 2025 09:01

pblazej requested a review from xianshijing-lk October 15, 2025 15:57

pblazej added 12 commits October 16, 2025 13:24

Move basic Agent files

bc411aa

Fix inconsistencies

43861e8

Media state from participant

314b1c1

Naming

93f3081

Attributes gen

319798f

Transcription tests

d4a496e

Extract tests

dfa4db6

Renaming

54acf68

Pass token sources

bec88b6

Renaming

33b02f9

Extract Options

d141d6a

Split options

0e49e6f

pblazej added 3 commits October 16, 2025 13:24

Nest

7b954da

Weak

06609c1

Fix existential

ac90eb4

pblazej force-pushed the blaze/agent-conversation branch from ec9bdcb to ac90eb4 Compare October 16, 2025 11:26

pblazej changed the title ~~Conversation API~~ Session API Oct 16, 2025

pblazej commented Oct 16, 2025

View reviewed changes

Sources/LiveKit/Agent/Session.swift Outdated Show resolved Hide resolved

pblazej added 2 commits October 16, 2025 13:48

Errors

d84ef9b

Sendable

4fcd651

pblazej marked this pull request as ready for review October 16, 2025 12:22

Older Swift

d5e6437

pblazej force-pushed the blaze/agent-conversation branch from 2074ccd to d5e6437 Compare October 16, 2025 12:56

1egoman reviewed Oct 16, 2025

View reviewed changes

CR: Session.withAgent factory

9cf68ff

lukasIO reviewed Oct 20, 2025

View reviewed changes

Sources/LiveKit/Agent/Session.swift Outdated Show resolved Hide resolved

pblazej added 2 commits October 21, 2025 12:27

CR: Don't expose multiple agents

ae38145

Naming

2327745

pblazej force-pushed the blaze/agent-conversation branch from e8d9e6a to 2327745 Compare October 21, 2025 12:09

pblazej added 3 commits October 21, 2025 15:01

Use ordered dict

ddf20a5

Merge branch 'main' into blaze/agent-conversation

89e25f5

Alt design: Agent struct/enum

a93fcf0

pblazej added 2 commits October 22, 2025 16:28

Discussion: update logic from JS

961b7c4

Expose state again

15509a9


		// MARK: - Init

		public init(credentials: CredentialsProvider, room: Room = .init(), agentName: String? = nil, senders: [any MessageSender]? = nil, receivers: [any MessageReceiver]? = nil) {

Session API #789

Are you sure you want to change the base?

Session API #789

Uh oh!

Conversation

pblazej commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

1egoman Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bcherry left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pblazej commented Oct 16, 2025

JS diff

Session Lifecycle 🟢

Agent Representation 🟠

Messaging Pipeline 🟢

Timeout & Errors 🟠

Media & Local Resources 🟢

State Exposure & Reactivity 🟢

Uh oh!

pblazej commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

1egoman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

1egoman Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

1egoman Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pblazej Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pblazej Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pblazej commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

pblazej commented Sep 18, 2025 •

edited

Loading

1egoman Sep 18, 2025 •

edited

Loading

github-actions bot commented Oct 15, 2025 •

edited

Loading

pblazej commented Oct 16, 2025 •

edited

Loading

1egoman Oct 16, 2025 •

edited

Loading

1egoman Oct 17, 2025 •

edited

Loading

pblazej Oct 21, 2025 •

edited

Loading

pblazej Oct 21, 2025 •

edited

Loading