Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add grounding convenience function #378

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

deekshas8
Copy link
Contributor

Context

SAP/ai-sdk-js-backlog#178.

Please provide a description of what your change does and why it is needed.

Definition of Done

  • Code is tested (Unit, E2E)
  • Error handling created / updated & covered by the tests above
  • Documentation updated
  • (Optional) Aligned changes with the Java SDK
  • (Optional) Release notes updated

@jjtang1985
Copy link
Contributor

While providing ids for data_repositories is helpful as this is the parameter from orchestration API perspective, there might be another idea.

When doing a normal chat completion, you also need a deployment_id.
The SDK allows users for providing the model name and does the search for the user.

Is it possible to do similar things here?
If a user knows the title of data_repository, should we also search for the id?

packages/orchestration/README.md Outdated Show resolved Hide resolved
.changeset/shiny-brooms-punch.md Outdated Show resolved Hide resolved
@deekshas8 deekshas8 requested a review from shibeshduw December 27, 2024 15:57
Copy link
Contributor

@ZhongpinWang ZhongpinWang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written also separately below, I would use the term orchestration grounding instead of document grounding or mention document grounding service as this is a dedicated service and has nothing to do with orchestration service.

Some other minor suggestions.

.changeset/shiny-brooms-punch.md Outdated Show resolved Hide resolved
packages/orchestration/README.md Outdated Show resolved Hide resolved
packages/orchestration/README.md Outdated Show resolved Hide resolved
packages/orchestration/README.md Outdated Show resolved Hide resolved
packages/orchestration/src/orchestration-types.ts Outdated Show resolved Hide resolved
packages/orchestration/src/orchestration-util.ts Outdated Show resolved Hide resolved
packages/orchestration/src/orchestration-util.ts Outdated Show resolved Hide resolved
packages/orchestration/src/orchestration-util.ts Outdated Show resolved Hide resolved
packages/orchestration/src/orchestration-util.ts Outdated Show resolved Hide resolved
.changeset/shiny-brooms-punch.md Outdated Show resolved Hide resolved
packages/orchestration/src/orchestration-filter-utility.ts Outdated Show resolved Hide resolved
packages/orchestration/src/orchestration-util.ts Outdated Show resolved Hide resolved
/**
* Represents the configuration for the Document Grounding Service.
*/
export interface DocumentGroundingServiceConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[q] To understand for myself better. Why did you choose to declare this as an interface instead of a type?
I see the original config that gets generated is a type.

Comment on lines 211 to 220
filters: [
{
id: 'filter-id',
data_repositories: ['repo1', 'repo2'],
data_repository_type: 'custom-type'
}
],
input_params: ['input'],
output_param: 'output'
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[q] I played around a bit with if null values can be set for properties inside filters object. I found that for id and data_repository_type this is possible as all of them resolve to any.

Should we consider also restricting these values to string in our types for useful typehints in autocompletion?
Or do you think this should rather be fixed by the Orchestration team instead?

Copy link
Contributor Author

@deekshas8 deekshas8 Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unfortunately comes from the spec. @MatKuhr This seems wrong to me, combining and empty object and enum in schema. Can we suggest an improvement:

SearchSelectOptionEnum:
  anyOf:
    - type: string
      enum:
        - ignoreIfKeyAbsent
    - type: string

Also here.

GroundingFilterId is any since the spec provides no type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the current state was inspired from this discussion. But in OpenAPI 3.0, the anyOf actually get's completely ignored here as far as I understand, and this is also how the Swagger Editor behaves. It seem there hasn't been much progress on the matter since 😄

Using the above suggestion leads to problems with the Java generator. It can't deal with this at the moment, and it looks similar for the Python.

But I'm also not sure how beneficial this would be. Ultimately, the above is semantically equivalent to just writing type: string. So the only purpose is documentation, which might as well be moved to a description.

➡️ I think we should just remove this from the spec and either declare string with a description, or declare the enum and expect clients to use the appropriate generator option enumUnknownDefaultCase.

Copy link
Contributor Author

@deekshas8 deekshas8 Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this actually works in TS ;) The type that gets generated is:

export type SearchSelectOptionEnum = 'ignoreIfKeyAbsent' | string;

The advantage being its extensible and provides autocompletiong. The current 'ignoreIfKeyAbsent' | any; causes loss of type safety.
But if this suggestion doesn't work well with Java/Python, I understand it doesnt make sense then.

Regarding the solution, I also discussed with @tomfrenken. Having string type with a description sounds like a clearer solution. The SDK generator atleast doesn;t support enumUnknownDefaultCase.

For the GroundingFilterId, I was also not sure for what they do with this id and what is it supposed to be (since the spec has no type, it gets resolved to any), and therefore didn;t introduce generating a random guid.
I agree with the suggestion that it should be optional in the spec and if not provided by the user, the service should handle it. We can also do it if they agree not to do it on their end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened a PR in orchestration repo to fix the GroundingFilterId type to string.

@ZhongpinWang ZhongpinWang dismissed their stale review January 3, 2025 12:06

Discussed

Copy link
Member

@MatKuhr MatKuhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise LGTM, assuming the azure filter stuff was only moved but not modified

packages/orchestration/README.md Outdated Show resolved Hide resolved
packages/orchestration/src/orchestration-types.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@jjtang1985 jjtang1985 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM.
We need to update the sample as well.

type: 'document_grounding_service',
config: {
filters: [
grounding: buildDocumentGroundingConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same updates can be applied to the sample app

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants