Adding a compatibility layer between langchain and the inference APIs. #206710

pgayvallet · 2025-01-15T09:10:47Z

The problem

At the moment, the only "common path" for all of our LLM calls are the stack connectors. The rest of the call chain totally differs depending on whether the developers are using langchain or the inference APIs.

Also worth noting that:

this "common path" code isn't even really common. We have multiple connectors, so codepath vary depending on which one is being used for a specific call, and even within connectors, different usages are calling different subActions (e.g security via langchain and o11y via the inference plugins aren't calling the exact same code in the openAI connector).
this "common path" code is very low level, making it difficult to implement common features in that layer. One example would be the token count logic that is implement at the stack connector's level and shared between connectors: It's mostly loosely-typed spaghetti, because of the nature of where that logic lives.

But... why is this a problem?

It makes it harder to implement low-level platform features

At a platform's level, this is fairly problematic because it makes it very complicated to implement platform-wide LLM features that should live in those "low-level" layers.

For example, if we wanted some global LLM call logging or tracing today, (and wanted to avoid doing so in the stack connectors for the reasons described in the previous section) we would be forced to implement it twice: once in some langchain layer (chatModel, event hook or other), and once in the inference APIs. It would also makes it harder to have a unified format for such logging, as we would be using different input between those two "parts" of the feature.

It causes more maintenance

There are two parts here:

direct follow-up from previous point: spread out / split features intrinsically cause more maintenance, as we need to do things more than once, and potentially differently, every time we need to add / change something. So not having that common path causes more maintenance at the platform level
less significant, but even for solutions, there is maintenance involved. As langchain isn't "officially" supported by platform, the custom langchain parts we're using in Kibana are maintained by solutions. Having a bridge between langchain and the inference APIs would allow to make those components officially supported, and maintained, by platform, reducing maintenance at the solutions level

Proposed solution

The basic idea is to use the inference APIs as sole chatModel provider for all of our usages of langchain within the Kibana platform. That would be achieved by having a custom langchain chatModel implementation that would be using the inference chatComplete API under the hood (as prototyped in #206429).

Note that this approach does not differ in any way compared to how solutions are using langchain today - it's just about switching an implementation (or more multiple implementations of chatModel) with a single one, that would be using/communicating with the inference APIs, so the changes at the solutions' level should be minimal.

Main upsides of the approach

More control for platform

Having every LLM calls go through the inference APIs would allow us to have that "not-so-low-level" common code path for all LLM calls across the Kibana platform, addressing the problem described in that issue.

That would allow us to more easily introduce some platform-level features for our LLM usages such as:

centralized tracing
centralized logging
centralized audit logging
(and potentially more).

No disruption for solutions

For solutions using the inference APIs directly, no changes will be required.

For solutions using langchain, the only work required will be to change their chatModel selection logic to (always) use that new inference-based chatModel, and, perform some regression testing to make sure that everything works as expected with the new implementation. Platform could even help for that first part.

Less maintenance for solutions

Even if we know that long term, we'd like to get rid of the "legacy" LLM connectors, and the associated langchain models, it's still unclear when we will be able to do that. That compatibility approach would allow to get rid of the custom langchain models today, and given that that new ChatModel implementation would be owned and maintained by platform, that would reduces for solutions the maintenances on those custom langchain components.

Allow cleanup of connectors

With every LLM calls going through the inference APIs, we could then perform some extensive cleanup of the genAI connectors, only keeping the subActions that we know the inference APIs are utilizing, and removing the rest. This would significantly cleanup that part of the code, reducing maintenance costs in the meantime.

The text was updated successfully, but these errors were encountered:

pgayvallet · 2025-01-15T10:33:49Z

cc @elastic/security-generative-ai @elastic/obs-ai-assistant @joemcelroy @sphilipse

## Summary Related to #206710 Add a `modelName` parameter to the chatComplete inference API, and wire it accordingly on all adapters. That parameter can be used to override the default model specified by the connector, at call time.

## Summary Related to elastic#206710 Add a `modelName` parameter to the chatComplete inference API, and wire it accordingly on all adapters. That parameter can be used to override the default model specified by the connector, at call time. (cherry picked from commit e0092ad)

## Summary Related to #206710 Attach the `configuration` from the stack connector to the internal `InferenceConnector` structure we're passing down to inference adapters. This will allow the adapters to retrieve information from the stack connector, which will be useful to introduce more provider specific logic (for example, automatically detect if the underlying provider supports native function calling or not). This is also a requirement for the langchain bridge, as the chatModel will need to know which type of provider is used under the hood.

…#207202) ## Summary Related to elastic#206710 Attach the `configuration` from the stack connector to the internal `InferenceConnector` structure we're passing down to inference adapters. This will allow the adapters to retrieve information from the stack connector, which will be useful to introduce more provider specific logic (for example, automatically detect if the underlying provider supports native function calling or not). This is also a requirement for the langchain bridge, as the chatModel will need to know which type of provider is used under the hood. (cherry picked from commit a69236d)

## Summary Related to elastic#206710 Add a `modelName` parameter to the chatComplete inference API, and wire it accordingly on all adapters. That parameter can be used to override the default model specified by the connector, at call time.

…#207202) ## Summary Related to elastic#206710 Attach the `configuration` from the stack connector to the internal `InferenceConnector` structure we're passing down to inference adapters. This will allow the adapters to retrieve information from the stack connector, which will be useful to introduce more provider specific logic (for example, automatically detect if the underlying provider supports native function calling or not). This is also a requirement for the langchain bridge, as the chatModel will need to know which type of provider is used under the hood.

## Summary Part of #206710 This PR introduces the `InferenceChatModel` class, which is a langchain chatModel utilizing the inference APIs (`chatComplete`) under the hood. Creating instances of `InferenceChatModel` can either be done by manually importing the class from the new `@kbn/inference-langchain` package, or by using the new `createChatModel` API exposes from the inference plugin's start contract. The main upside of using this chatModel is that the unification and normalization layers are already being taken care of by the inference plugin, making sure that the underlying models are being used with the exact same capabilities. More details on the upsides and reasoning in the associated issue. ### Usage Usage is very straightforward ```ts const chatModel = await inferenceStart.getChatModel({ request, connectorId: myInferenceConnectorId, chatModelOptions: { temperature: 0.2, }, }); // just use it as another langchain chatModel, e.g. const response = await chatModel.stream('What is Kibana?'); for await (const chunk of response) { // do something with the chunk } ``` ### Important This PR is only adding the implementation, and not wiring it anywhere or using it in any existing code. This is meant to be done in a later stage. Merging that implementation first will allow to have distinct PRs for the integration with search (playground) and security (assistant + other workflows), with proper testing --------- Co-authored-by: kibanamachine <[email protected]>

## Summary Part of elastic#206710 This PR introduces the `InferenceChatModel` class, which is a langchain chatModel utilizing the inference APIs (`chatComplete`) under the hood. Creating instances of `InferenceChatModel` can either be done by manually importing the class from the new `@kbn/inference-langchain` package, or by using the new `createChatModel` API exposes from the inference plugin's start contract. The main upside of using this chatModel is that the unification and normalization layers are already being taken care of by the inference plugin, making sure that the underlying models are being used with the exact same capabilities. More details on the upsides and reasoning in the associated issue. ### Usage Usage is very straightforward ```ts const chatModel = await inferenceStart.getChatModel({ request, connectorId: myInferenceConnectorId, chatModelOptions: { temperature: 0.2, }, }); // just use it as another langchain chatModel, e.g. const response = await chatModel.stream('What is Kibana?'); for await (const chunk of response) { // do something with the chunk } ``` ### Important This PR is only adding the implementation, and not wiring it anywhere or using it in any existing code. This is meant to be done in a later stage. Merging that implementation first will allow to have distinct PRs for the integration with search (playground) and security (assistant + other workflows), with proper testing --------- Co-authored-by: kibanamachine <[email protected]> (cherry picked from commit 1c218f9) # Conflicts: # .github/CODEOWNERS

pgayvallet added the Team:AI Infra AppEx AI Infrastructure Team label Jan 15, 2025

pgayvallet mentioned this issue Jan 15, 2025

Introduce the InferenceChatModel for langchain #206429

Merged

pgayvallet mentioned this issue Jan 15, 2025

[inference] add support for modelName parameter #206706

Merged

pgayvallet mentioned this issue Jan 20, 2025

[inference] propagate connector config to underlying adapter #207202

Merged

pgayvallet self-assigned this Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a compatibility layer between langchain and the inference APIs. #206710

Adding a compatibility layer between langchain and the inference APIs. #206710

pgayvallet commented Jan 15, 2025 •

edited

Loading

pgayvallet commented Jan 15, 2025

Adding a compatibility layer between langchain and the inference APIs. #206710

Adding a compatibility layer between langchain and the inference APIs. #206710

Comments

pgayvallet commented Jan 15, 2025 • edited Loading

The problem

But... why is this a problem?

It makes it harder to implement low-level platform features

It causes more maintenance

Proposed solution

Main upsides of the approach

More control for platform

No disruption for solutions

Less maintenance for solutions

Allow cleanup of connectors

pgayvallet commented Jan 15, 2025

pgayvallet commented Jan 15, 2025 •

edited

Loading