-
Notifications
You must be signed in to change notification settings - Fork 150
Session API #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Session API #789
Conversation
6ea1621
to
2f9bbee
Compare
|
||
// MARK: - Init | ||
|
||
public init(credentials: CredentialsProvider, room: Room = .init(), agentName: String? = nil, senders: [any MessageSender]? = nil, receivers: [any MessageReceiver]? = nil) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@1egoman @lukasIO I think that's the discussion about the logic:
agentName
should take part in the direct dispatch- we'll introduce plural case later while keeping an internal array? it creates some confusion why the conversation cannot happen "with multiple agents"
- wait for agents can only happen at the conversation level (as the
Agent
will be published when joining)- I believe we should check the names vs who actually joined
- name shouldn't take part in the filtering? so that I'll keep an agent that I formally did not pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll introduce plural case later while keeping an internal array? it creates some confusion why the conversation cannot happen "with multiple agents"
This was generally what I had in mind yes, start with a singular agentName
-type parameter, and then in the future add an internal array which can be backed by a new agentNames
parameter (In swift I'd think it could probably be a new overload? On web, the way to accomplish the same thing would be that the parameter would now accept either a string
or Array<string>
, and renamed since parameter names aren't part of the external interface in js)
This link may be useful and represents the initial state of this on the web: https://github.com/livekit/components-js/pull/1207/files#diff-c2401cb9c778162d5def12d137a663f03477b02c804d9372846c318e903df77bR125-R146
wait for agents can only happen at the conversation level (as the Agent will be published when joining)
On the web right now, what I'm doing is in useConversation
I'm calling useAgent
to get access to the current agent, and then on the agent I exposed a method called waitForAvailable
which returns a promise which resolves once agent.isAvailable
is true. await agent.waitForAvailable()
is then being called in conversation.start()
.
Associated code here: https://github.com/livekit/components-js/pull/1207/files#diff-c2401cb9c778162d5def12d137a663f03477b02c804d9372846c318e903df77bR306-R346
In the future, what I had been thinking (and another use-case for conversation) is it could maintain a registry of all currently connected agents. I started going down the road of building that in javascript sort of with useAgentTimeoutStore
here (internal hook / not exposed). Right now it is largely responsible for agent timeout management but I could see that growing, storing data across multiple agents, and moving underneath conversation in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@1egoman I think like we're still scratching the surface with "absence" 🥲
My high-level approach (not implemented yet) would be:
- Conversation controls the (global) timeout (as it does the dispatch), agent does not know about its own timeout, etc. 🟢
- Currently, passing one
agentName
is a little misleading for people having multiple agents anyway (without direct dispatch)... - When
agentName
is passed:- we do wait for this particular agent to join
- should this wait be awaitable or background
- we do wait for this particular agent to join
- When multiple
agentNames
are passed:- do we wait for all?
- When no
agentName
is passed:- we do wait for any agent 🟠
- we do not wait at all?
Agent
is added to the registry when it joins the room (agent lifecycle == participant lifecycle), regardless of its (conversational)state 🟠 theoretically we can add them with"listening" | "thinking" | "speaking";
, maybe including"idle"
but it also limits the space of states (if we don't register them - with the guarantee that they won't disappear after turningidle
) or is a little confusing (if we register them with another concept of "availability"). IMO, consumers should learn whatAgentState
means for their use cases, it's hard to tell what "available" means universally.- I'm not sure if doing that
await agent.waitUntilAvailable(signal);
is a great idea - should it be awaitable if you still wanna present some placeholder UI while agent joins/dispatches; thisawait
can be modeled by agent's optionalityAgent?
/its tracks anyway 🟠 - How to model
iCanSpeak
state - which is crucial for the UI:- During pre-connect:
capturingAudio + .disconnected, .connecting, .reconnecting, .connected
- After pre-connect: probably
connectionState
(room) is not enough, but how to handle e.g. handoff - I'd rather interpret that as "I'm in the room and someone is listening" rather than "some agent X is listening" as there may be gaps, etc.
- During pre-connect:
The key difference is probably in how we think about "presence" - shall we make some arbitrary decisions here or no?
94ec7d0
to
e5caee2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this generally looks good - lmk when the API is considered final
aa93417
to
212035c
Compare
51915ab
to
0c89008
Compare
c52f944
to
9b16217
Compare
|
ec9bdcb
to
ac90eb4
Compare
JS diff🟢 - good enough Session Lifecycle 🟢
Agent Representation 🟠
Messaging Pipeline 🟢
Timeout & Errors 🟠
Media & Local Resources 🟢
State Exposure & Reactivity 🟢
|
2074ccd
to
d5e6437
Compare
I haven't included most of the comments - until the API is approved 🟢 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did another pass through and left a few notes, but largely looks like things are aligned on web + swift which is great!
defer { | ||
waitForAgentTask = Task { [weak self] in | ||
try await Task.sleep(nanoseconds: UInt64(timeout * Double(NSEC_PER_SEC))) | ||
try Task.checkCancellation() | ||
guard let self else { return } | ||
if connectionState == .connected, agents.isEmpty { | ||
self.error = .agentNotConnected | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: Just wanted to surface another thing you had mentioned in your comment here - I think you missed that the web implementation not only stores a list of failureReasons
but ALSO transitions to a new failed
state. That state transition IMO is the more important behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Reading through this, it looks like it doesn't handle the case of multiple agents joining properly because it is one global task, even though your agents
data structure does handle multiple agents properly.
For example, I'm thinking of this scenario:
- Agent
a
is dispatched - Wait 10 seconds
- Agent
b
is dispatched - Wait 10 more seconds
- At this point, both agents are still in a non timed out state even though
a
hasn't connected after 20 seconds (the default timeout).
I think in practice this case won't ever be an issue right now because the external Session
interface won't let you do this, but IMO this bug shouldn't be left lurking when the plan is to allow behavior like this later, and I'd think it wouldn't be that hard to fix (start a timeout task for each agent independently).
Or alternatively, maybe it's worth just storing one agent internally for now like I mentioned in another comment, in which case what you are doing here I think works fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I think I prefer something closer to what I did on the web slightly to what you did here - what do you think about this:
- Add a
failure
case toAgentState
here. - Apply this state to any agent that doesn't join after the specified timeout in this above logic, along with maybe some sort of more detailed error into on the agent (maybe continue the
self.error
pattern onAgent
as well?) - I think it's fine to still store a value on session's
self.error
, as long as it's clear to implementers that the error means that "at least one agent didn't connect" and not "all agents didn't connect".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I subconsciously avoided adding more states 😄
Agent
in SwiftUI world must be optional anyway, as it's injected via the magicalEnvironment
object, and we cannot just crash when it's not there- introducing the error state that you must handle (from the enum) just for the sake of the error scenario - maybe that's an overkill?
- IMO the core difference is I think more in terms of
Agent == Participant
- it's there where it's there, while you do more of anAgentPromise
kind of thing that may resolve with an error
My biggest argument here is still introducing more states that we need to handle sort of "manually" (are not just observations of the engine) may lead to awkward behaviors (like with the preconnect API when you need to "superimpose" your local state over the connection state).
I'll try to revisit the JS code, maybe find some consensus.
cc @lukasIO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
introducing the error state that you must handle (from the enum) just for the sake of the error scenario - maybe that's an overkill?
thought: My opinion - I see it as less overkill and more that without it, the agent state
value isn't properly representing all the options, so the Agent
ends up being a "leaky" state machine.
IMO the core difference is I think more in terms of Agent == Participant - it's there where it's there, while you do more of an AgentPromise kind of thing that may resolve with an error
thought: Yea you are right, on the web returning a "placeholder" agent response is important for a js-related reason (preserving the ability to destructure the useAgent
return). One other nice thing it unlocks is it means that if an end user queries an agent and gets back an empty value, they don't have to guess exactly what that "lack of value" means - the agent could be in the midst of dispatching, it could have not connected, etc. So with this approach you have to query somewhere else to get that state, and at least on the web, that state being "disconnected" from the agent object results in some ergonomic challenges (type narrowing via discriminated unions won't work properly).
My biggest argument here is still introducing more states that we need to handle sort of "manually" (are not just observations of the engine) may lead to awkward behaviors (like with the preconnect API when you need to "superimpose" your local state over the connection state).
question: I'm not exactly sure I understand the preconnect API nuance you are describing, can you ellaborate further?
thought: Reading this though, you are making me think that maybe your concerns would be alleviated by splitting up the AgentState
enum into two levels. Then you are never replacing a state value from lk.agent.state
with an "internal" state value. So something like:
type AgentState =
| { state: "connecting" }
| {
state: "connected",
agent: {
state: 'initializing' | 'idle' | 'listening' | 'thinking' | 'speaking',
/* todo: add other agent related metadata in here */
}
| {
state: "failed",
failureReasons: Array<string>, // Or maybe something else, just some way to capture the "why" behind the failure
}
(sorry for the typescript and not swift, I'm hoping that gets the point across!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukasIO so let's adopt web's additional state?
As you see from above, my main point is if we need to introduce additional case, means that AgentState
is "malformed" internally (does not represent assumptions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukasIO @1egoman I started refactoring in this direction, the part I find awkward to port is:
https://github.com/livekit/components-js/blob/f118da6e678c4a91be91c4dfc9b3b61eb7f64e2a/packages/react/src/hooks/useAgent.ts#L394-L425
The obstacles:
- it requires bi-directional
Room
observation
if (roomConnectionState !== ConnectionState.Disconnected) {
state = 'connecting';
}
- I'm unable to calculate preconnect state in the agent itself (as the local track is not associated with local participant at the time)
if (localMicTrack) {
state = 'listening';
bufferingSpeachLocally = true;
}
- I could move the
failed
state to the agent, but:- to represent
.connecting
(and start the timeout clock) I need speculativeAgent
creation (not via simple unidirectionalRoom
observation as it is right now)
- to represent
Finally, the core difference is:
- in JS Agent is a "computed state"
- in Swift Agent is an object with its own lifecycle
- why? I started with a similar approach, more like "value semantics", but having a class instead allows us to track local state (without the fear of mutating the copy), inject Agent separately, etc.
In other words, it would be easier now to move the whole state back to Session
that would own agentState
- making multiple agents more awkward.
OR
Publish separate Agent(s) where all the state is computed in the Session
itself.
Hope it makes sense (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, coming back to square one:
waitForAgentTask = Task { [weak self] in
try await Task.sleep(nanoseconds: UInt64(timeout * Double(NSEC_PER_SEC)))
try Task.checkCancellation()
guard let self else { return }
if state == .connected, agents.isEmpty {
self.error = .agentNotConnected // I cannot transition the agent to failed state as agents.isEmpty by definition
}
}
The closest alternative would be to move Session
to a .failed
state, but that's really just another representation of Error
, just more awkward to consume on the frontend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it requires bi-directional Room observation
Maybe I'm misunderstanding "bi-directional" here, but the session itself should not depend on the agent's state.
A session's connection state (and connection) are separate from whether or not an agent is connected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukasIO I mean the other way around - agent depends on session (room indeed): https://github.com/livekit/components-js/blob/f118da6e678c4a91be91c4dfc9b3b61eb7f64e2a/packages/react/src/hooks/useAgent.ts#L402
in Swift these are separate entities as mentioned above, agent is not a computed property over session - so must update its local state, thus the word "bidirectional".
e8d9e6a
to
2327745
Compare
@1egoman @lukasIO in the spirit of "talk is cheap", here's an alternative design, with more "stupid" The key part is: public enum Agent {
case disconnected
case connecting
case connected(AgentState, (any AudioTrack)?, (any VideoTrack)?)
case failed(Error)
} I think the benefits have been justified earlier, possible downsides:
|
Adds 3 basic building blocks for simple(r) agent experiences:
Session
- connection, pre-connect, agent dispatch, agent filtering (e.g. by name), all agents, messages (broadcasted and aggregated for now)Agent
- wrapper aroundParticipant
, knows its tracks and internal stateLocalMedia
- (unrelated) helper to deal with local tracks in SwiftUIExample: livekit-examples/agent-starter-swift#29