WIP Mitigate hallucinations with codeblocks containing Typescript/Web Components/Javascript #64
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why
There is nothing more costly to productivity than LLM hallucinations within codeblocks that accepted into a codebase.
We have mitigated that risk with GraphQL. We need to do the same with Typescript, Web components, and Javascript codeblocks using Shopify's APIs.
What
Even if users aren't using Typescript (web components with polaris app home), we can still use the types of the Typescript APIs that are directly or indirectly used.
This introduces a new
validate_typescript_codeblocks
MCP tool that accepts a TS package name that will be used to detect hallucinations / errors in codeblocks LLMs generated/modified.In the case of app home, it will accept the Typescript library
@shopify/app-bridge-ui-types
that app home uses indirectly as our TS source of truth. We'll configurelearn_shopify_api
's response to require all the LLM to use this tool to validate any codeblocks they would offer return to a user.WIP
What's still in question is how we will use TS .
This uses Typia to validate TS without having a user's machine. I need to stress test the implementation with more than the 3 weak sauce evals I've been testing this with.
This previous idea added a temporary TS file and using the user's CLI to typescript compile. I worry about how many more things can go wrong with this case: the user doesn't have typescript installed because they prefer JS/Web components. This could send an agent spiraling and modify the user's machine in a way that isn't necessary. It also has the risk of not cleaning up after itself (it temporariliy creates a TS file to run the typescript compile command.