-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a compatibility layer between langchain and the inference APIs. #206710
Labels
Team:AI Infra
AppEx AI Infrastructure Team
Comments
cc @elastic/security-generative-ai @elastic/obs-ai-assistant @joemcelroy @sphilipse |
pgayvallet
added a commit
that referenced
this issue
Jan 16, 2025
## Summary Related to #206710 Add a `modelName` parameter to the chatComplete inference API, and wire it accordingly on all adapters. That parameter can be used to override the default model specified by the connector, at call time.
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this issue
Jan 16, 2025
## Summary Related to elastic#206710 Add a `modelName` parameter to the chatComplete inference API, and wire it accordingly on all adapters. That parameter can be used to override the default model specified by the connector, at call time. (cherry picked from commit e0092ad)
pgayvallet
added a commit
that referenced
this issue
Jan 23, 2025
## Summary Related to #206710 Attach the `configuration` from the stack connector to the internal `InferenceConnector` structure we're passing down to inference adapters. This will allow the adapters to retrieve information from the stack connector, which will be useful to introduce more provider specific logic (for example, automatically detect if the underlying provider supports native function calling or not). This is also a requirement for the langchain bridge, as the chatModel will need to know which type of provider is used under the hood.
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this issue
Jan 23, 2025
…#207202) ## Summary Related to elastic#206710 Attach the `configuration` from the stack connector to the internal `InferenceConnector` structure we're passing down to inference adapters. This will allow the adapters to retrieve information from the stack connector, which will be useful to introduce more provider specific logic (for example, automatically detect if the underlying provider supports native function calling or not). This is also a requirement for the langchain bridge, as the chatModel will need to know which type of provider is used under the hood. (cherry picked from commit a69236d)
viduni94
pushed a commit
to viduni94/kibana
that referenced
this issue
Jan 23, 2025
## Summary Related to elastic#206710 Add a `modelName` parameter to the chatComplete inference API, and wire it accordingly on all adapters. That parameter can be used to override the default model specified by the connector, at call time.
viduni94
pushed a commit
to viduni94/kibana
that referenced
this issue
Jan 23, 2025
…#207202) ## Summary Related to elastic#206710 Attach the `configuration` from the stack connector to the internal `InferenceConnector` structure we're passing down to inference adapters. This will allow the adapters to retrieve information from the stack connector, which will be useful to introduce more provider specific logic (for example, automatically detect if the underlying provider supports native function calling or not). This is also a requirement for the langchain bridge, as the chatModel will need to know which type of provider is used under the hood.
qn895
pushed a commit
to qn895/kibana
that referenced
this issue
Jan 23, 2025
…#207202) ## Summary Related to elastic#206710 Attach the `configuration` from the stack connector to the internal `InferenceConnector` structure we're passing down to inference adapters. This will allow the adapters to retrieve information from the stack connector, which will be useful to introduce more provider specific logic (for example, automatically detect if the underlying provider supports native function calling or not). This is also a requirement for the langchain bridge, as the chatModel will need to know which type of provider is used under the hood.
JoseLuisGJ
pushed a commit
to JoseLuisGJ/kibana
that referenced
this issue
Jan 27, 2025
…#207202) ## Summary Related to elastic#206710 Attach the `configuration` from the stack connector to the internal `InferenceConnector` structure we're passing down to inference adapters. This will allow the adapters to retrieve information from the stack connector, which will be useful to introduce more provider specific logic (for example, automatically detect if the underlying provider supports native function calling or not). This is also a requirement for the langchain bridge, as the chatModel will need to know which type of provider is used under the hood.
pgayvallet
added a commit
that referenced
this issue
Feb 3, 2025
## Summary Part of #206710 This PR introduces the `InferenceChatModel` class, which is a langchain chatModel utilizing the inference APIs (`chatComplete`) under the hood. Creating instances of `InferenceChatModel` can either be done by manually importing the class from the new `@kbn/inference-langchain` package, or by using the new `createChatModel` API exposes from the inference plugin's start contract. The main upside of using this chatModel is that the unification and normalization layers are already being taken care of by the inference plugin, making sure that the underlying models are being used with the exact same capabilities. More details on the upsides and reasoning in the associated issue. ### Usage Usage is very straightforward ```ts const chatModel = await inferenceStart.getChatModel({ request, connectorId: myInferenceConnectorId, chatModelOptions: { temperature: 0.2, }, }); // just use it as another langchain chatModel, e.g. const response = await chatModel.stream('What is Kibana?'); for await (const chunk of response) { // do something with the chunk } ``` ### Important This PR is only adding the implementation, and not wiring it anywhere or using it in any existing code. This is meant to be done in a later stage. Merging that implementation first will allow to have distinct PRs for the integration with search (playground) and security (assistant + other workflows), with proper testing --------- Co-authored-by: kibanamachine <[email protected]>
pgayvallet
added a commit
to pgayvallet/kibana
that referenced
this issue
Feb 3, 2025
## Summary Part of elastic#206710 This PR introduces the `InferenceChatModel` class, which is a langchain chatModel utilizing the inference APIs (`chatComplete`) under the hood. Creating instances of `InferenceChatModel` can either be done by manually importing the class from the new `@kbn/inference-langchain` package, or by using the new `createChatModel` API exposes from the inference plugin's start contract. The main upside of using this chatModel is that the unification and normalization layers are already being taken care of by the inference plugin, making sure that the underlying models are being used with the exact same capabilities. More details on the upsides and reasoning in the associated issue. ### Usage Usage is very straightforward ```ts const chatModel = await inferenceStart.getChatModel({ request, connectorId: myInferenceConnectorId, chatModelOptions: { temperature: 0.2, }, }); // just use it as another langchain chatModel, e.g. const response = await chatModel.stream('What is Kibana?'); for await (const chunk of response) { // do something with the chunk } ``` ### Important This PR is only adding the implementation, and not wiring it anywhere or using it in any existing code. This is meant to be done in a later stage. Merging that implementation first will allow to have distinct PRs for the integration with search (playground) and security (assistant + other workflows), with proper testing --------- Co-authored-by: kibanamachine <[email protected]> (cherry picked from commit 1c218f9) # Conflicts: # .github/CODEOWNERS
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The problem
At the moment, the only "common path" for all of our LLM calls are the stack connectors. The rest of the call chain totally differs depending on whether the developers are using langchain or the inference APIs.
Also worth noting that:
But... why is this a problem?
It makes it harder to implement low-level platform features
At a platform's level, this is fairly problematic because it makes it very complicated to implement platform-wide LLM features that should live in those "low-level" layers.
For example, if we wanted some global LLM call logging or tracing today, (and wanted to avoid doing so in the stack connectors for the reasons described in the previous section) we would be forced to implement it twice: once in some langchain layer (chatModel, event hook or other), and once in the inference APIs. It would also makes it harder to have a unified format for such logging, as we would be using different input between those two "parts" of the feature.
It causes more maintenance
There are two parts here:
direct follow-up from previous point: spread out / split features intrinsically cause more maintenance, as we need to do things more than once, and potentially differently, every time we need to add / change something. So not having that common path causes more maintenance at the platform level
less significant, but even for solutions, there is maintenance involved. As langchain isn't "officially" supported by platform, the custom langchain parts we're using in Kibana are maintained by solutions. Having a bridge between langchain and the inference APIs would allow to make those components officially supported, and maintained, by platform, reducing maintenance at the solutions level
Proposed solution
The basic idea is to use the inference APIs as sole chatModel provider for all of our usages of langchain within the Kibana platform. That would be achieved by having a custom langchain
chatModel
implementation that would be using the inferencechatComplete
API under the hood (as prototyped in #206429).Note that this approach does not differ in any way compared to how solutions are using langchain today - it's just about switching an implementation (or more multiple implementations of chatModel) with a single one, that would be using/communicating with the inference APIs, so the changes at the solutions' level should be minimal.
Main upsides of the approach
More control for platform
Having every LLM calls go through the inference APIs would allow us to have that "not-so-low-level" common code path for all LLM calls across the Kibana platform, addressing the problem described in that issue.
That would allow us to more easily introduce some platform-level features for our LLM usages such as:
No disruption for solutions
For solutions using the inference APIs directly, no changes will be required.
For solutions using langchain, the only work required will be to change their chatModel selection logic to (always) use that new inference-based chatModel, and, perform some regression testing to make sure that everything works as expected with the new implementation. Platform could even help for that first part.
Less maintenance for solutions
Even if we know that long term, we'd like to get rid of the "legacy" LLM connectors, and the associated langchain models, it's still unclear when we will be able to do that. That compatibility approach would allow to get rid of the custom langchain models today, and given that that new ChatModel implementation would be owned and maintained by platform, that would reduces for solutions the maintenances on those custom langchain components.
Allow cleanup of connectors
With every LLM calls going through the inference APIs, we could then perform some extensive cleanup of the genAI connectors, only keeping the subActions that we know the inference APIs are utilizing, and removing the rest. This would significantly cleanup that part of the code, reducing maintenance costs in the meantime.
The text was updated successfully, but these errors were encountered: