Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a compatibility layer between langchain and the inference APIs. #206710

Open
pgayvallet opened this issue Jan 15, 2025 · 1 comment
Open
Assignees
Labels
Team:AI Infra AppEx AI Infrastructure Team

Comments

@pgayvallet
Copy link
Contributor

pgayvallet commented Jan 15, 2025

The problem

At the moment, the only "common path" for all of our LLM calls are the stack connectors. The rest of the call chain totally differs depending on whether the developers are using langchain or the inference APIs.

Also worth noting that:

  • this "common path" code isn't even really common. We have multiple connectors, so codepath vary depending on which one is being used for a specific call, and even within connectors, different usages are calling different subActions (e.g security via langchain and o11y via the inference plugins aren't calling the exact same code in the openAI connector).
  • this "common path" code is very low level, making it difficult to implement common features in that layer. One example would be the token count logic that is implement at the stack connector's level and shared between connectors: It's mostly loosely-typed spaghetti, because of the nature of where that logic lives.

But... why is this a problem?

It makes it harder to implement low-level platform features

At a platform's level, this is fairly problematic because it makes it very complicated to implement platform-wide LLM features that should live in those "low-level" layers.

For example, if we wanted some global LLM call logging or tracing today, (and wanted to avoid doing so in the stack connectors for the reasons described in the previous section) we would be forced to implement it twice: once in some langchain layer (chatModel, event hook or other), and once in the inference APIs. It would also makes it harder to have a unified format for such logging, as we would be using different input between those two "parts" of the feature.

It causes more maintenance

There are two parts here:

  1. direct follow-up from previous point: spread out / split features intrinsically cause more maintenance, as we need to do things more than once, and potentially differently, every time we need to add / change something. So not having that common path causes more maintenance at the platform level

  2. less significant, but even for solutions, there is maintenance involved. As langchain isn't "officially" supported by platform, the custom langchain parts we're using in Kibana are maintained by solutions. Having a bridge between langchain and the inference APIs would allow to make those components officially supported, and maintained, by platform, reducing maintenance at the solutions level

Proposed solution

The basic idea is to use the inference APIs as sole chatModel provider for all of our usages of langchain within the Kibana platform. That would be achieved by having a custom langchain chatModel implementation that would be using the inference chatComplete API under the hood (as prototyped in #206429).

Note that this approach does not differ in any way compared to how solutions are using langchain today - it's just about switching an implementation (or more multiple implementations of chatModel) with a single one, that would be using/communicating with the inference APIs, so the changes at the solutions' level should be minimal.

Main upsides of the approach

More control for platform

Having every LLM calls go through the inference APIs would allow us to have that "not-so-low-level" common code path for all LLM calls across the Kibana platform, addressing the problem described in that issue.

That would allow us to more easily introduce some platform-level features for our LLM usages such as:

  • centralized tracing
  • centralized logging
  • centralized audit logging
  • (and potentially more).

No disruption for solutions

For solutions using the inference APIs directly, no changes will be required.

For solutions using langchain, the only work required will be to change their chatModel selection logic to (always) use that new inference-based chatModel, and, perform some regression testing to make sure that everything works as expected with the new implementation. Platform could even help for that first part.

Less maintenance for solutions

Even if we know that long term, we'd like to get rid of the "legacy" LLM connectors, and the associated langchain models, it's still unclear when we will be able to do that. That compatibility approach would allow to get rid of the custom langchain models today, and given that that new ChatModel implementation would be owned and maintained by platform, that would reduces for solutions the maintenances on those custom langchain components.

Allow cleanup of connectors

With every LLM calls going through the inference APIs, we could then perform some extensive cleanup of the genAI connectors, only keeping the subActions that we know the inference APIs are utilizing, and removing the rest. This would significantly cleanup that part of the code, reducing maintenance costs in the meantime.

@pgayvallet
Copy link
Contributor Author

cc @elastic/security-generative-ai @elastic/obs-ai-assistant @joemcelroy @sphilipse

pgayvallet added a commit that referenced this issue Jan 16, 2025
## Summary

Related to #206710

Add a `modelName` parameter to the chatComplete inference API, and wire
it accordingly on all adapters.

That parameter can be used to override the default model specified by
the connector, at call time.
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Jan 16, 2025
## Summary

Related to elastic#206710

Add a `modelName` parameter to the chatComplete inference API, and wire
it accordingly on all adapters.

That parameter can be used to override the default model specified by
the connector, at call time.

(cherry picked from commit e0092ad)
pgayvallet added a commit that referenced this issue Jan 23, 2025
## Summary

Related to #206710

Attach the `configuration` from the stack connector to the internal
`InferenceConnector` structure we're passing down to inference adapters.

This will allow the adapters to retrieve information from the stack
connector, which will be useful to introduce more provider specific
logic (for example, automatically detect if the underlying provider
supports native function calling or not).

This is also a requirement for the langchain bridge, as the chatModel
will need to know which type of provider is used under the hood.
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Jan 23, 2025
…#207202)

## Summary

Related to elastic#206710

Attach the `configuration` from the stack connector to the internal
`InferenceConnector` structure we're passing down to inference adapters.

This will allow the adapters to retrieve information from the stack
connector, which will be useful to introduce more provider specific
logic (for example, automatically detect if the underlying provider
supports native function calling or not).

This is also a requirement for the langchain bridge, as the chatModel
will need to know which type of provider is used under the hood.

(cherry picked from commit a69236d)
viduni94 pushed a commit to viduni94/kibana that referenced this issue Jan 23, 2025
## Summary

Related to elastic#206710

Add a `modelName` parameter to the chatComplete inference API, and wire
it accordingly on all adapters.

That parameter can be used to override the default model specified by
the connector, at call time.
viduni94 pushed a commit to viduni94/kibana that referenced this issue Jan 23, 2025
…#207202)

## Summary

Related to elastic#206710

Attach the `configuration` from the stack connector to the internal
`InferenceConnector` structure we're passing down to inference adapters.

This will allow the adapters to retrieve information from the stack
connector, which will be useful to introduce more provider specific
logic (for example, automatically detect if the underlying provider
supports native function calling or not).

This is also a requirement for the langchain bridge, as the chatModel
will need to know which type of provider is used under the hood.
qn895 pushed a commit to qn895/kibana that referenced this issue Jan 23, 2025
…#207202)

## Summary

Related to elastic#206710

Attach the `configuration` from the stack connector to the internal
`InferenceConnector` structure we're passing down to inference adapters.

This will allow the adapters to retrieve information from the stack
connector, which will be useful to introduce more provider specific
logic (for example, automatically detect if the underlying provider
supports native function calling or not).

This is also a requirement for the langchain bridge, as the chatModel
will need to know which type of provider is used under the hood.
JoseLuisGJ pushed a commit to JoseLuisGJ/kibana that referenced this issue Jan 27, 2025
…#207202)

## Summary

Related to elastic#206710

Attach the `configuration` from the stack connector to the internal
`InferenceConnector` structure we're passing down to inference adapters.

This will allow the adapters to retrieve information from the stack
connector, which will be useful to introduce more provider specific
logic (for example, automatically detect if the underlying provider
supports native function calling or not).

This is also a requirement for the langchain bridge, as the chatModel
will need to know which type of provider is used under the hood.
pgayvallet added a commit that referenced this issue Feb 3, 2025
## Summary

Part of #206710

This PR introduces the `InferenceChatModel` class, which is a langchain
chatModel utilizing the inference APIs (`chatComplete`) under the hood.

Creating instances of `InferenceChatModel` can either be done by
manually importing the class from the new `@kbn/inference-langchain`
package, or by using the new `createChatModel` API exposes from the
inference plugin's start contract.

The main upside of using this chatModel is that the unification and
normalization layers are already being taken care of by the inference
plugin, making sure that the underlying models are being used with the
exact same capabilities. More details on the upsides and reasoning in
the associated issue.

### Usage

Usage is very straightforward

```ts
const chatModel = await inferenceStart.getChatModel({
  request,
  connectorId: myInferenceConnectorId,
  chatModelOptions: {
    temperature: 0.2,
  },
});

// just use it as another langchain chatModel, e.g.
const response = await chatModel.stream('What is Kibana?');
for await (const chunk of response) {
     // do something with the chunk
}
``` 

### Important

This PR is only adding the implementation, and not wiring it anywhere or
using it in any existing code. This is meant to be done in a later
stage. Merging that implementation first will allow to have distinct PRs
for the integration with search (playground) and security (assistant +
other workflows), with proper testing

---------

Co-authored-by: kibanamachine <[email protected]>
pgayvallet added a commit to pgayvallet/kibana that referenced this issue Feb 3, 2025
## Summary

Part of elastic#206710

This PR introduces the `InferenceChatModel` class, which is a langchain
chatModel utilizing the inference APIs (`chatComplete`) under the hood.

Creating instances of `InferenceChatModel` can either be done by
manually importing the class from the new `@kbn/inference-langchain`
package, or by using the new `createChatModel` API exposes from the
inference plugin's start contract.

The main upside of using this chatModel is that the unification and
normalization layers are already being taken care of by the inference
plugin, making sure that the underlying models are being used with the
exact same capabilities. More details on the upsides and reasoning in
the associated issue.

### Usage

Usage is very straightforward

```ts
const chatModel = await inferenceStart.getChatModel({
  request,
  connectorId: myInferenceConnectorId,
  chatModelOptions: {
    temperature: 0.2,
  },
});

// just use it as another langchain chatModel, e.g.
const response = await chatModel.stream('What is Kibana?');
for await (const chunk of response) {
     // do something with the chunk
}
```

### Important

This PR is only adding the implementation, and not wiring it anywhere or
using it in any existing code. This is meant to be done in a later
stage. Merging that implementation first will allow to have distinct PRs
for the integration with search (playground) and security (assistant +
other workflows), with proper testing

---------

Co-authored-by: kibanamachine <[email protected]>
(cherry picked from commit 1c218f9)

# Conflicts:
#	.github/CODEOWNERS
@pgayvallet pgayvallet self-assigned this Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:AI Infra AppEx AI Infrastructure Team
Projects
None yet
Development

No branches or pull requests

1 participant