Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ai prevent python async + max db schema length #3440

Merged
merged 1 commit into from
Mar 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion backend/windmill-worker/src/graphql_executor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,9 @@ pub async fn do_graphql(
}));

if let Some(token) = &api.bearer_token {
request = request.bearer_auth(token.as_str());
if token.len() > 0 {
request = request.bearer_auth(token.as_str());
}
}

if let Some(headers) = &api.custom_headers {
Expand Down
2 changes: 1 addition & 1 deletion frontend/src/lib/components/DBSchemaExplorer.svelte
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@
</ToggleButtonGroup>
{/if}
{#if dbSchema.lang === 'graphql'}
<GraphqlSchemaViewer code={formatGraphqlSchema(dbSchema)} class="h-full" />
<GraphqlSchemaViewer code={formatGraphqlSchema(dbSchema.schema)} class="h-full" />
{:else}
<ObjectViewer json={formatSchema(dbSchema)} pureViewer collapseLevel={1} />
{/if}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,13 @@
import { JobService, Preview } from '$lib/gen'
import type { DBSchema, DBSchemas, GraphqlSchema, SQLSchema } from '$lib/stores'
import { buildClientSchema, getIntrospectionQuery, printSchema } from 'graphql'
import {
buildClientSchema,
getIntrospectionQuery,
printSchema,
type IntrospectionQuery
} from 'graphql'
import { tryEvery } from '$lib/utils'
import { stringifySchema } from '$lib/components/copilot/lib'

export enum ColumnIdentity {
ByDefault = 'By Default',
Expand Down Expand Up @@ -442,21 +448,29 @@ export async function getDbSchemas(
const schema =
processingFn !== undefined ? processingFn(testResult.result) : testResult.result

dbSchemas[resourcePath] = {
const dbSchema = {
lang: resourceTypeToLang(resourceType) as SQLSchema['lang'],
schema,
publicOnly: !!schema.public || !!schema.PUBLIC || !!schema.dbo
}
dbSchemas[resourcePath] = {
...dbSchema,
stringified: stringifySchema(dbSchema)
}
} else {
if (typeof testResult.result !== 'object' || !('__schema' in testResult.result)) {
console.error('Invalid GraphQL schema')

errorCallback('Invalid GraphQL schema')
} else {
dbSchemas[resourcePath] = {
lang: 'graphql',
const dbSchema = {
lang: 'graphql' as GraphqlSchema['lang'],
schema: testResult.result
}
dbSchemas[resourcePath] = {
...dbSchema,
stringified: stringifySchema(dbSchema)
}
}
}
}
Expand Down Expand Up @@ -486,18 +500,20 @@ export async function getDbSchemas(
})
}

export function formatSchema(dbSchema: DBSchema) {
if (dbSchema.lang !== 'graphql' && dbSchema.publicOnly) {
export function formatSchema(dbSchema: {
lang: SQLSchema['lang']
schema: SQLSchema['schema']
publicOnly: SQLSchema['publicOnly']
}) {
if (dbSchema.publicOnly) {
return dbSchema.schema.public || dbSchema.schema.PUBLIC || dbSchema.schema.dbo || dbSchema
} else if (dbSchema.lang === 'mysql' && Object.keys(dbSchema.schema).length === 1) {
return dbSchema.schema[Object.keys(dbSchema.schema)[0]]
} else {
return dbSchema.schema
}
}

export function formatGraphqlSchema(dbSchema: GraphqlSchema): string {
return printSchema(buildClientSchema(dbSchema.schema))
export function formatGraphqlSchema(schema: IntrospectionQuery): string {
return printSchema(buildClientSchema(schema))
}

export function getFieldType(type: string, databaseType: DbType) {
Expand Down
29 changes: 25 additions & 4 deletions frontend/src/lib/components/copilot/ScriptGen.svelte
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<script lang="ts">
import { Button } from '../common'

import { SUPPORTED_LANGUAGES, copilot } from './lib'
import { MAX_SCHEMA_LENGTH, SUPPORTED_LANGUAGES, addThousandsSeparator, copilot } from './lib'
import type { SupportedLanguage } from '$lib/common'
import { sendUserToast } from '$lib/toast'
import type Editor from '../Editor.svelte'
Expand All @@ -19,10 +19,20 @@
import LoadingIcon from '../apps/svelte-select/lib/LoadingIcon.svelte'
import { sleep } from '$lib/utils'
import { autoPlacement } from '@floating-ui/core'
import { Ban, Bot, Check, ExternalLink, HistoryIcon, Wand2, X } from 'lucide-svelte'
import {
AlertTriangle,
Ban,
Bot,
Check,
ExternalLink,
HistoryIcon,
Wand2,
X
} from 'lucide-svelte'
import { fade } from 'svelte/transition'
import { isInitialCode } from '$lib/script_helpers'
import { twMerge } from 'tailwind-merge'
import Popover from '../Popover.svelte'

// props
export let iconOnly: boolean = false
Expand Down Expand Up @@ -406,13 +416,24 @@

{#if ['postgresql', 'mysql', 'snowflake', 'bigquery', 'mssql', 'graphql'].includes(lang) && dbSchema?.lang === lang}
<div class="flex flex-row items-center justify-between gap-2 w-96">
<div class="flex flex-row items-center">
<div class="flex flex-row items-center gap-1">
<p class="text-xs text-secondary">
Context: {lang === 'graphql' ? 'GraphQL' : 'DB'} schema
</p>
<Tooltip>
<Tooltip placement="top">
We pass the selected schema to GPT-4 Turbo for better script generation.
</Tooltip>
{#if dbSchema.stringified.length > MAX_SCHEMA_LENGTH}
<Popover notClickable placement="top">
<AlertTriangle size={16} class="text-yellow-500" />
<svelte:fragment slot="text">
The schema is about {addThousandsSeparator(dbSchema.stringified.length / 3.5)}
tokens. To avoid exceeding the model's context length, it will be truncated to
{addThousandsSeparator(MAX_SCHEMA_LENGTH / 3.5)}
tokens.
</svelte:fragment>
</Popover>
{/if}
</div>
{#if dbSchema.lang !== 'graphql' && (dbSchema.schema?.public || dbSchema.schema?.PUBLIC || dbSchema.schema?.dbo)}
<ToggleButtonGroup class="w-auto shrink-0" bind:selected={dbSchema.publicOnly}>
Expand Down
91 changes: 55 additions & 36 deletions frontend/src/lib/components/copilot/lib.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ import { OpenAI } from 'openai'
import { OpenAPI, ResourceService, Script } from '../../gen'
import type { Writable } from 'svelte/store'

import type { DBSchema } from '$lib/stores'
import type { DBSchema, GraphqlSchema, SQLSchema } from '$lib/stores'
import { formatResourceTypes } from './utils'

import { EDIT_CONFIG, FIX_CONFIG, GEN_CONFIG } from './prompts'
Expand Down Expand Up @@ -147,53 +147,72 @@ export async function addResourceTypes(scriptOptions: CopilotOptions, prompt: st
return prompt
}

export const MAX_SCHEMA_LENGTH = 100000 * 3.5

export function addThousandsSeparator(n: number) {
return n.toFixed().replace(/\B(?=(\d{3})+(?!\d))/g, "'")
}

export function stringifySchema(
dbSchema: Omit<SQLSchema, 'stringified'> | Omit<GraphqlSchema, 'stringified'>
) {
const { schema, lang } = dbSchema
if (lang === 'graphql') {
let graphqlSchema = printSchema(buildClientSchema(schema))
return graphqlSchema
} else {
let smallerSchema: {
[schemaKey: string]: {
[tableKey: string]: Array<[string, string, boolean, string?]>
}
} = {}
for (const schemaKey in schema) {
smallerSchema[schemaKey] = {}
for (const tableKey in schema[schemaKey]) {
smallerSchema[schemaKey][tableKey] = []
for (const colKey in schema[schemaKey][tableKey]) {
const col = schema[schemaKey][tableKey][colKey]
const p: [string, string, boolean, string?] = [colKey, col.type, col.required]
if (col.default) {
p.push(col.default)
}
smallerSchema[schemaKey][tableKey].push(p)
}
}
}

let finalSchema: typeof smallerSchema | (typeof smallerSchema)['schemaKey'] = smallerSchema
if (dbSchema.publicOnly) {
finalSchema =
smallerSchema.public || smallerSchema.PUBLIC || smallerSchema.dbo || smallerSchema
} else if (lang === 'mysql' && Object.keys(smallerSchema).length === 1) {
finalSchema = smallerSchema[Object.keys(smallerSchema)[0]]
}
return JSON.stringify(finalSchema)
}
}

function addDBSChema(scriptOptions: CopilotOptions, prompt: string) {
const { dbSchema, language } = scriptOptions
if (
dbSchema &&
['postgresql', 'mysql', 'snowflake', 'bigquery', 'mssql', 'graphql'].includes(language) && // make sure we are using a SQL/query language
language === dbSchema.lang // make sure we are using the same language as the schema
) {
const { schema, lang } = dbSchema
if (lang === 'graphql') {
const graphqlSchema = printSchema(buildClientSchema(schema))
prompt =
prompt +
'\nHere is the GraphQL schema: <schema>\n' +
JSON.stringify(graphqlSchema) +
'\n</schema>'
} else {
let smallerSchema: {
[schemaKey: string]: {
[tableKey: string]: Array<[string, string, boolean, string?]>
}
} = {}
for (const schemaKey in schema) {
smallerSchema[schemaKey] = {}
for (const tableKey in schema[schemaKey]) {
smallerSchema[schemaKey][tableKey] = []
for (const colKey in schema[schemaKey][tableKey]) {
const col = schema[schemaKey][tableKey][colKey]
const p: [string, string, boolean, string?] = [colKey, col.type, col.required]
if (col.default) {
p.push(col.default)
}
smallerSchema[schemaKey][tableKey].push(p)
}
}
let { stringified } = dbSchema
if (dbSchema.lang === 'graphql') {
if (stringified.length > MAX_SCHEMA_LENGTH) {
stringified = stringified.slice(0, MAX_SCHEMA_LENGTH) + '...'
}

let finalSchema: typeof smallerSchema | (typeof smallerSchema)['schemaKey'] = smallerSchema
if (dbSchema.publicOnly) {
finalSchema =
smallerSchema.public || smallerSchema.PUBLIC || smallerSchema.dbo || smallerSchema
} else if (lang === 'mysql' && Object.keys(smallerSchema).length === 1) {
finalSchema = smallerSchema[Object.keys(smallerSchema)[0]]
prompt = prompt + '\nHere is the GraphQL schema: <schema>\n' + stringified + '\n</schema>'
} else {
if (stringified.length > MAX_SCHEMA_LENGTH) {
stringified = stringified.slice(0, MAX_SCHEMA_LENGTH) + '...'
}
prompt =
prompt +
"\nHere's the database schema, each column is in the format [name, type, required, default?]: <dbschema>\n" +
JSON.stringify(finalSchema) +
stringified +
'\n</dbschema>'
}
}
Expand Down
1 change: 1 addition & 0 deletions frontend/src/lib/components/copilot/prompts/edit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ prompts:
```
<contextual_information>
You have to write a function in python called "main". Specify the parameter types. Do not call the main function. You should generally return the result.
The "main" function cannot be async. If you need to use async code, you can use the asyncio library.
You can take as parameters resources which are dictionaries containing credentials or configuration information. For Windmill to correctly detect the resources to be passed, the resource type name has to be exactly as specified in the following list:
<resourceTypes>
{resourceTypes}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ export const EDIT_PROMPT = {
"system": "You are a helpful coding assistant for Windmill, a developer platform for running scripts. You modify code as instructed by the user. Each user message includes some contextual information which should guide your answer.\nOnly output code. Wrap the code in a code block. \nPut explanations directly in the code as comments.\nReturn the complete modified code.\n\nHere's how interactions have to look like:\nuser: {sample_question}\nassistant: ```language\n{code}\n```",
"prompts": {
"python3": {
"prompt": "Here's my python3 code: \n```python\n{code}\n```\n<contextual_information>\nYou have to write a function in python called \"main\". Specify the parameter types. Do not call the main function. You should generally return the result.\nYou can take as parameters resources which are dictionaries containing credentials or configuration information. For Windmill to correctly detect the resources to be passed, the resource type name has to be exactly as specified in the following list:\n<resourceTypes>\n{resourceTypes}\n</resourceTypes>\nYou need to define the type of the resources that are needed before the main function, but only include them if they are actually needed to achieve the function purpose.\nThe resource type name has to be exactly as specified (has to be IN LOEWRCASE). If the type name conflicts with any imported methods, you have to rename the imported method with the conflicting name.\n<contextual_information>\nMy instructions: {description}"
"prompt": "Here's my python3 code: \n```python\n{code}\n```\n<contextual_information>\nYou have to write a function in python called \"main\". Specify the parameter types. Do not call the main function. You should generally return the result.\nThe \"main\" function cannot be async. If you need to use async code, you can use the asyncio library.\nYou can take as parameters resources which are dictionaries containing credentials or configuration information. For Windmill to correctly detect the resources to be passed, the resource type name has to be exactly as specified in the following list:\n<resourceTypes>\n{resourceTypes}\n</resourceTypes>\nYou need to define the type of the resources that are needed before the main function, but only include them if they are actually needed to achieve the function purpose.\nThe resource type name has to be exactly as specified (has to be IN LOEWRCASE). If the type name conflicts with any imported methods, you have to rename the imported method with the conflicting name.\n<contextual_information>\nMy instructions: {description}"
},
"deno": {
"prompt": "Here's my TypeScript code in a deno running environment:\n```typescript\n{code}\n```\n<contextual_information>\nWe have to export a \"main\" function like this: \"export async function main(...)\" and specify the parameter types but do not call it. You should generally return the result.\nIf needed, the standard fetch method is available globally, do not import it.\nYou can take as parameters resources which are dictionaries containing credentials or configuration information. Name the resource parameters like this: \"{resource_type}Resource\". \nThe resource type name has to be exactly as specified.\n<resourceTypes>\n{resourceTypes}\n</resourceTypes>\nOnly define the type for resources that are actually needed to achieve the function purpose. If the type name conflicts with the imported object, rename the imported object NOT THE TYPE.\n</contextual_information>\nMy instructions: {description}"
Expand Down
1 change: 1 addition & 0 deletions frontend/src/lib/components/copilot/prompts/fix.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ prompts:
```
<contextual_information>
You have to write a function in python called "main". Specify the parameter types. Do not call the main function. You should generally return the result.
The "main" function cannot be async. If you need to use async code, you can use the asyncio library.
You can take as parameters resources which are dictionaries containing credentials or configuration information. For Windmill to correctly detect the resources to be passed, the resource type name has to be exactly as specified in the following list:
<resourceTypes>
{resourceTypes}
Expand Down
2 changes: 1 addition & 1 deletion frontend/src/lib/components/copilot/prompts/fixPrompt.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ export const FIX_PROMPT = {
"system": "You are a helpful coding assistant for Windmill, a developer platform for running scripts. You fix the code shared by the user. Each user message includes some contextual information which should guide your answer.\nOnly output code. Wrap the code in a code block. \nExplain the error and the fix after generating the code inside an <explanation> tag.\nAlso put explanations directly in the code as comments.\nReturn the complete fixed code.\n\nHere's how interactions have to look like:\nuser: {sample_question}\nassistant: ```language\n{code}\n```\n<explanation>{explanation}</explanation>",
"prompts": {
"python3": {
"prompt": "Here's my python3 code: \n```python\n{code}\n```\n<contextual_information>\nYou have to write a function in python called \"main\". Specify the parameter types. Do not call the main function. You should generally return the result.\nYou can take as parameters resources which are dictionaries containing credentials or configuration information. For Windmill to correctly detect the resources to be passed, the resource type name has to be exactly as specified in the following list:\n<resourceTypes>\n{resourceTypes}\n</resourceTypes>\nYou need to define the type of the resources that are needed before the main function, but only include them if they are actually needed to achieve the function purpose.\nThe resource type name has to be exactly as specified (has to be IN LOEWRCASE). If the type name conflicts with any imported methods, you have to rename the imported method with the conflicting name.\n<contextual_information>\nI get the following error: {error}\nFix my code."
"prompt": "Here's my python3 code: \n```python\n{code}\n```\n<contextual_information>\nYou have to write a function in python called \"main\". Specify the parameter types. Do not call the main function. You should generally return the result.\nThe \"main\" function cannot be async. If you need to use async code, you can use the asyncio library.\nYou can take as parameters resources which are dictionaries containing credentials or configuration information. For Windmill to correctly detect the resources to be passed, the resource type name has to be exactly as specified in the following list:\n<resourceTypes>\n{resourceTypes}\n</resourceTypes>\nYou need to define the type of the resources that are needed before the main function, but only include them if they are actually needed to achieve the function purpose.\nThe resource type name has to be exactly as specified (has to be IN LOEWRCASE). If the type name conflicts with any imported methods, you have to rename the imported method with the conflicting name.\n<contextual_information>\nI get the following error: {error}\nFix my code."
},
"deno": {
"prompt": "Here's my TypeScript code in a deno running environment:\n```typescript\n{code}\n```\n<contextual_information>\nWe have to export a \"main\" function like this: \"export async function main(...)\" and specify the parameter types but do not call it. You should generally return the result.\nIf needed, the standard fetch method is available globally, do not import it.\nYou can take as parameters resources which are dictionaries containing credentials or configuration information. Name the resource parameters like this: \"{resource_type}Resource\".\nThe following resource types are available:\n<resourceTypes>\n{resourceTypes}\n</resourceTypes>\nOnly define the type for resources that are actually needed to achieve the function purpose. The resource type name has to be exactly as specified. If the type name conflicts with the imported object, rename the imported object NOT THE TYPE.\n</contextual_information>\nI get the following error: {error}\nFix my code."
Expand Down
1 change: 1 addition & 0 deletions frontend/src/lib/components/copilot/prompts/gen.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ prompts:
prompt: |-
<contextual_information>
You have to write a function in Python called "main". Specify the parameter types. Do not call the main function. You should generally return the result.
The "main" function cannot be async. If you need to use async code, you can use the asyncio library.
You can take as parameters resources which are dictionaries containing credentials or configuration information. For Windmill to correctly detect the resources to be passed, the resource type name has to be exactly as specified in the following list:
<resourceTypes>
{resourceTypes}
Expand Down
Loading
Loading