Skip to content

chore(tools): POC to consolidate immutable ddl tools while preserving the accuracy #354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 54 commits into
base: chore/issue-307-proposal-2
Choose a base branch
from

Conversation

himanshusinghs
Copy link
Collaborator

Proposed changes

Checklist

@himanshusinghs himanshusinghs changed the base branch from main to chore/issue-307-proposal-2 July 10, 2025 18:31
@himanshusinghs himanshusinghs force-pushed the poc/tool-consolidation branch from e20104b to e846520 Compare July 10, 2025 20:54
LangChain's ToolCalling agent was not providing a structured tool call
response and different model providers were providing entirely different
tool calls for the same tool definition which was too turbulent for us
to have any accuracy baseline at all.

Vercel's AI SDK pushes us forward on that problem and the tool call
responses so far have always been well structured.

This commit replaces LangChain based implementation with Vercel's AI SDK
based implementation.
When writing test cases, I realized that it is too much duplicated effort to write and maintain mocks. So instead of having only a mocked mcp client, this commit introduces a real mcp client that talks to our mcp server and is still mockable.

We are now setting up real MCP client with test data in mongodb database spun up for test suites. Mocking is still an option but we likely never feel the need for that.
introduces the following necessary env variables:
- MDB_ACCURACY_RUN_ID: The accuracy run id
- MDB_ACCURACY_MDB_URL: The connection string to mongodb instance where the snapshots will be stored
- MDB_ACCURACY_MDB_DB: The database for snapshots
- MDB_ACCURACY_MDB_COLLECTION: The collection for snapshots
himanshusinghs and others added 21 commits July 14, 2025 01:37
The new field `accuracyRunStatus` is supposed to help guard against
cases where jest might fail in between, maybe due to LLM rate limit
errors or something else, and we then have a partially saved state of an
accuracy run. With the new field `accuracyRunStatus` we should be able
to safely look for last runs where `accuracyRunStatus` is done and have
complete state of accuracy snapshot.
…nts.

1. Removes unnecessary suite description from tests
2. Removes the test suite name from the storage as well
3. Centralize the constants used everywhere in the SDK
4. Adds clarifying comments and docs wherever necessary
5. Write tests for accuracy-scorer
Instead of storing multiple documents per accuracy test run(one for each
prompt+model response), we will now be storing one document for accuracy
result and under that, all the prompt+model responses will be nested.
@himanshusinghs himanshusinghs force-pushed the chore/issue-307-proposal-2 branch from f666014 to 1cc93f2 Compare July 13, 2025 23:37
@himanshusinghs himanshusinghs force-pushed the poc/tool-consolidation branch from e846520 to 2062d6f Compare July 13, 2025 23:43
1. use commit sha for github actions
2. run workflow also on pushes to main
3. use ai-sdk/google instead of privately published package
@himanshusinghs himanshusinghs force-pushed the poc/tool-consolidation branch from 2062d6f to cf32564 Compare July 13, 2025 23:56
Consolidates list-databases, list-collections, collection-indexes and
collection-schema.

The tool calling accuracy went from 100 to 75. The LLM was always
mistaking the command name to be listDatabases instead of list-databases
and then course correcting when shown the error.
In the previous commit, our schema description was not clear enough for
LLM to correctly provide the name of command and it was hallucinating by
providing listDatabases for example instead of list-databases.
This commit modifies the descriptions a bit to achieve 100% accuracy for
the same prompts.
@himanshusinghs himanshusinghs force-pushed the poc/tool-consolidation branch from cf32564 to be45e70 Compare July 13, 2025 23:59
@himanshusinghs himanshusinghs force-pushed the chore/issue-307-proposal-2 branch from 34dd207 to 562a2cb Compare July 15, 2025 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant