Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
57ce94e
version
mikecann Aug 13, 2025
0d563c2
wip
mikecann Aug 21, 2025
55db9ea
seems to work and output results
mikecann Aug 21, 2025
e286408
more logs
mikecann Aug 21, 2025
128e15e
wip
mikecann Aug 21, 2025
c841b72
pretty printing to console
mikecann Aug 21, 2025
fdd16f0
pulled out some reporting stuff into its own file
mikecann Aug 21, 2025
3699d07
logging pass or vail to console
mikecann Aug 21, 2025
308249f
fixed test run
mikecann Aug 21, 2025
ab955c0
updated readme
mikecann Aug 21, 2025
4573cb4
more minial logging
mikecann Aug 21, 2025
dd1b219
more wip
mikecann Aug 21, 2025
4378b8e
local mode and braintrust mode
mikecann Aug 21, 2025
c4ab0b8
makeing local mode null out the key
mikecann Aug 21, 2025
790f8b2
updated readme again
mikecann Aug 21, 2025
f32c004
catching potential future fails
mikecann Aug 21, 2025
5f41834
wip
mikecann Aug 21, 2025
ebeb9ca
wip
mikecann Aug 21, 2025
e9a8fff
grade last
mikecann Aug 21, 2025
2885d59
wip
mikecann Aug 21, 2025
a3ade49
tidied up grade_last
mikecann Aug 21, 2025
651617a
adding gpt5 mini
mikecann Aug 21, 2025
8640e8e
ifxed 000 and 001
mikecann Aug 21, 2025
0a74958
significantly improved graders
mikecann Aug 22, 2025
de25ab7
making it bomb out early
mikecann Aug 22, 2025
492aad1
making it a little more explicit so graders pass
mikecann Aug 22, 2025
d556060
updated tasks to be more explicit regarding function argument names
mikecann Aug 22, 2025
58d44a7
fixed this eval
mikecann Aug 22, 2025
06c987f
more expliti
mikecann Aug 22, 2025
85fed2b
updgrading 001
mikecann Aug 22, 2025
fb44b1b
fixed up these graders
mikecann Aug 22, 2025
ca3f488
trying to fix this one up
mikecann Aug 22, 2025
c27361e
updates tasks and removed brittle function spec check
mikecann Aug 22, 2025
d5ff605
added a couple of extra tests
mikecann Aug 22, 2025
0417849
making tasks more explicit
mikecann Aug 22, 2025
346c9a0
removed function spec checks
mikecann Aug 22, 2025
22f5086
updated tasks and actions
mikecann Aug 22, 2025
4bf90c9
added some more tests that are now needed
mikecann Aug 22, 2025
42924be
better script
mikecann Aug 22, 2025
fb539d4
replacing if statements with a match
mikecann Aug 25, 2025
1829352
putting logging behind an env var
mikecann Aug 25, 2025
5e64fca
moved logging
mikecann Aug 25, 2025
07ced69
fixed backend loading issues on windows
mikecann Aug 25, 2025
3d92c0c
Merge branch 'mikec/disable_braintrust' into mikec/upgrade_graders
mikecann Aug 25, 2025
f1525cf
adding the verbose logging back in
mikecann Aug 25, 2025
3a24f52
now grades answers
mikecann Aug 25, 2025
8d9b710
wip better idioms
mikecann Aug 25, 2025
f37953b
these arent needed
mikecann Aug 25, 2025
33524c9
fixed this task
mikecann Aug 25, 2025
824bb91
upgrading a few more tasks
mikecann Aug 26, 2025
a2f0fbb
model optionsal
mikecann Aug 26, 2025
145fb68
removing that testtidying
mikecann Aug 26, 2025
3870f06
specifying in the task that it needs to throw
mikecann Aug 26, 2025
8ecf174
removing some amiguity in the tasks
mikecann Aug 26, 2025
87aa49c
refined some tasks
mikecann Aug 26, 2025
0b87e68
added script to codegen all
mikecann Aug 26, 2025
1db3ee2
making sure I can run tsc on the graders
mikecann Aug 26, 2025
7dbdd92
all graders now pass ts checks
mikecann Aug 26, 2025
4ddb8d2
eval summary report a little nicer
mikecann Aug 26, 2025
6353e59
visualising local results
mikecann Aug 26, 2025
d7e63eb
wip visualise
mikecann Aug 26, 2025
33705f0
added more to the visualiser
mikecann Aug 26, 2025
fadf63a
wip
mikecann Aug 26, 2025
65e75ae
wip
mikecann Aug 26, 2025
e99868e
nicer
mikecann Aug 26, 2025
de2ec18
wip
mikecann Aug 26, 2025
d744756
wip
mikecann Aug 26, 2025
5aaa468
wip
mikecann Aug 26, 2025
c39412f
wip
mikecann Aug 26, 2025
887ffe2
reventing some log lines
mikecann Aug 29, 2025
60a1093
Merge branch 'main' into mikec/upgrade_graders
mikecann Aug 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion eslint.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ import tsparser from "@typescript-eslint/parser";

export default tseslint.config(
{
ignores: ["**/convex/_generated/"],
ignores: ["**/convex/_generated/", "scripts/**", "evals/**/grader.test.ts"],
},
tseslint.configs.recommendedTypeChecked,
{
Expand Down
206 changes: 63 additions & 143 deletions evals/000-fundamentals/000-empty_functions/grader.test.ts
Original file line number Diff line number Diff line change
@@ -1,169 +1,89 @@
import { expect, test } from "vitest";
import {
responseAdminClient,
responseClient,
compareSchema,
compareFunctionSpec,
} from "../../../grader";
import { responseAdminClient, responseClient } from "../../../grader";
import { anyApi } from "convex/server";

test("compare schema", async ({ skip }) => {
await compareSchema(skip);
});
test("empty public query", async () => {
expect(await responseClient.query(anyApi.index.emptyPublicQuery, {})).toBe(
null,
);

test("compare function spec", async ({ skip }) => {
await compareFunctionSpec(skip);
});
await expect(
responseClient.query(anyApi.index.emptyPublicQuery, { arg: "test" }),
).rejects.toThrow(/ArgumentValidationError/);

test("empty public query", async () => {
const result = await responseClient.query(anyApi.index.emptyPublicQuery, {});
expect(result).toBe(null);
await expect(
responseClient.mutation(anyApi.index.emptyPublicQuery, {}),
).rejects.toBeDefined();

let error: any = undefined;
try {
await responseClient.query(anyApi.index.emptyPublicQuery, {
arg: "test",
});
} catch (e) {
error = e;
}
expect(error).toBeDefined();
expect(error.toString()).toContain("ArgumentValidationError");

error = undefined;
try {
await responseClient.mutation(anyApi.index.emptyPublicQuery, {});
} catch (e) {
error = e;
}
expect(error).toBeDefined();

error = undefined;
try {
await responseClient.action(anyApi.index.emptyPublicQuery, {});
} catch (e) {
error = e;
}
expect(error).toBeDefined();
await expect(
responseClient.action(anyApi.index.emptyPublicQuery, {}),
).rejects.toBeDefined();
});

test("empty public mutation", async () => {
const result = await responseClient.mutation(
anyApi.index.emptyPublicMutation,
{},
);
expect(result).toBe(null);
expect(
await responseClient.mutation(anyApi.index.emptyPublicMutation, {}),
).toBe(null);

let error: any = undefined;
try {
await responseClient.mutation(anyApi.index.emptyPublicMutation, {
await expect(
responseClient.mutation(anyApi.index.emptyPublicMutation, {
arg: "test",
});
} catch (e) {
error = e;
}
expect(error).toBeDefined();
expect(error.toString()).toContain("ArgumentValidationError");

error = undefined;
try {
await responseClient.query(anyApi.index.emptyPublicMutation, {});
} catch (e) {
error = e;
}
expect(error).toBeDefined();

error = undefined;
try {
await responseClient.action(anyApi.index.emptyPublicMutation, {});
} catch (e) {
error = e;
}
expect(error).toBeDefined();
}),
).rejects.toThrow(/ArgumentValidationError/);

await expect(
responseClient.query(anyApi.index.emptyPublicMutation, {}),
).rejects.toBeDefined();

await expect(
responseClient.action(anyApi.index.emptyPublicMutation, {}),
).rejects.toBeDefined();
});

test("empty public action", async () => {
const result = await responseClient.action(
anyApi.index.emptyPublicAction,
{},
expect(await responseClient.action(anyApi.index.emptyPublicAction, {})).toBe(
null,
);
expect(result).toBe(null);

let error: any = undefined;
try {
await responseClient.action(anyApi.index.emptyPublicAction, {
arg: "test",
});
} catch (e) {
error = e;
}
expect(error).toBeDefined();
expect(error.toString()).toContain("ArgumentValidationError");

error = undefined;
try {
await responseClient.query(anyApi.index.emptyPublicAction, {});
} catch (e) {
error = e;
}
expect(error).toBeDefined();

error = undefined;
try {
await responseClient.mutation(anyApi.index.emptyPublicAction, {});
} catch (e) {
error = e;
}
expect(error).toBeDefined();
await expect(
responseClient.action(anyApi.index.emptyPublicAction, { arg: "test" }),
).rejects.toThrow(/ArgumentValidationError/);

await expect(
responseClient.query(anyApi.index.emptyPublicAction, {}),
).rejects.toBeDefined();

await expect(
responseClient.mutation(anyApi.index.emptyPublicAction, {}),
).rejects.toBeDefined();
});

test("empty private query", async () => {
let error: any = undefined;
try {
await responseClient.query(anyApi.index.emptyPrivateQuery, {});
} catch (e) {
error = e;
}
expect(error).toBeDefined();
expect(error.toString()).toContain("Could not find public function");

const result = await responseAdminClient.query(
anyApi.index.emptyPrivateQuery,
{},
);
expect(result).toBe(null);
await expect(
responseClient.query(anyApi.index.emptyPrivateQuery, {}),
).rejects.toThrow(/Could not find public function/);

expect(
await responseAdminClient.query(anyApi.index.emptyPrivateQuery, {}),
).toBe(null);
});

test("empty private mutation", async () => {
let error: any = undefined;
try {
await responseClient.mutation(anyApi.index.emptyPrivateMutation, {});
} catch (e) {
error = e;
}
expect(error).toBeDefined();
expect(error.toString()).toContain("Could not find public function");

const result = await responseAdminClient.mutation(
anyApi.index.emptyPrivateMutation,
{},
);
expect(result).toBe(null);
await expect(
responseClient.mutation(anyApi.index.emptyPrivateMutation, {}),
).rejects.toThrow(/Could not find public function/);

expect(
await responseAdminClient.mutation(anyApi.index.emptyPrivateMutation, {}),
).toBe(null);
});

test("empty private action", async () => {
let error: any = undefined;
try {
await responseClient.action(anyApi.index.emptyPrivateAction, {});
} catch (e) {
error = e;
}
expect(error).toBeDefined();
expect(error.toString()).toContain("Could not find public function");

const result = await responseAdminClient.action(
anyApi.index.emptyPrivateAction,
{},
);
expect(result).toBe(null);
await expect(
responseClient.action(anyApi.index.emptyPrivateAction, {}),
).rejects.toThrow(/Could not find public function/);

expect(
await responseAdminClient.action(anyApi.index.emptyPrivateAction, {}),
).toBe(null);
});
93 changes: 85 additions & 8 deletions evals/000-fundamentals/001-basic_schema/grader.test.ts
Original file line number Diff line number Diff line change
@@ -1,16 +1,93 @@
import { expect, test } from "vitest";
import {
responseAdminClient,
responseClient,
compareSchema,
compareFunctionSpec,
getSchema,
addDocuments,
listTable,
deleteAllDocuments,
} from "../../../grader";
import { api } from "./answer/convex/_generated/api";

test("compare schema", async ({ skip }) => {
await compareSchema(skip);
type SimpleSchema = { tables: Array<{ tableName: string }> };

test("has exactly two tables: users and messages", async () => {
const schemaUnknown: unknown = await getSchema(responseAdminClient);
expect(schemaUnknown).not.toBeNull();
const { tables } = schemaUnknown as SimpleSchema;
const tableNames = tables.map((t) => t.tableName);
expect(tableNames).toEqual(["messages", "users"]);
});

test("compare function spec", async ({ skip }) => {
await compareFunctionSpec(skip);
test("users table enforces single string field: name", async () => {
await deleteAllDocuments(responseAdminClient, ["users", "messages"]);

// valid insert
await addDocuments(responseAdminClient, "users", [{ name: "Alice" }]);
const users = await listTable(responseAdminClient, "users");
expect(users.length).toBe(1);

// extra field should fail
await expect(
addDocuments(responseAdminClient, "users", [
{ name: "Bob", extra: "nope" },
]),
).rejects.toBeDefined();

// wrong type should fail
await expect(
addDocuments(responseAdminClient, "users", [
{ name: 123 as unknown as string },
]),
).rejects.toBeDefined();

// missing required field should fail
await expect(
addDocuments(responseAdminClient, "users", [
{} as unknown as { name: string },
]),
).rejects.toBeDefined();
});

test("messages table enforces two string fields: text and authorName", async () => {
await deleteAllDocuments(responseAdminClient, ["users", "messages"]);

// valid insert
await addDocuments(responseAdminClient, "messages", [
{ text: "Hello", authorName: "Alice" },
]);
const messages = await listTable(responseAdminClient, "messages");
expect(messages.length).toBe(1);

// extra field should fail
await expect(
addDocuments(responseAdminClient, "messages", [
{ text: "Hi", authorName: "Bob", extra: "nope" },
]),
).rejects.toBeDefined();

// wrong types should fail
await expect(
addDocuments(responseAdminClient, "messages", [
{ text: 42 as unknown as string, authorName: "Alice" },
]),
).rejects.toBeDefined();
await expect(
addDocuments(responseAdminClient, "messages", [
{ text: "Hello", authorName: 99 as unknown as string },
]),
).rejects.toBeDefined();

// missing required fields should fail
await expect(
addDocuments(responseAdminClient, "messages", [
{ text: "Hello" } as unknown as { text: string; authorName: string },
]),
).rejects.toBeDefined();
await expect(
addDocuments(responseAdminClient, "messages", [
{ authorName: "Alice" } as unknown as {
text: string;
authorName: string;
},
]),
).rejects.toBeDefined();
});
16 changes: 13 additions & 3 deletions evals/000-fundamentals/002-basic_http_endpoint/TASK.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
Create a backend with only two HTTP routes. Serve the text "there" concatenated
onto the request body for GETs at `/api/hello`. Serve an empty response for POSTs
at `/api/messages/*`.
Create a backend with only two HTTP routes.

1) GET `/api/hello`
- Read the request body as text (treat missing/empty body as empty string)
- Respond with the body text concatenated with the literal string "there"
- Examples:
- body: "" -> response: "there"
- body: "hello" -> response: "hellothere"
- Set `Content-Type: text/plain`

2) POST `/api/messages/*`
- Return HTTP 200 with an empty response body
- This route should match any path under `/api/messages/` (e.g. `/api/messages/123`). In Convex, use a wildcard route (e.g. `pathPrefix: "/api/messages/"`).
27 changes: 18 additions & 9 deletions evals/000-fundamentals/002-basic_http_endpoint/grader.test.ts
Original file line number Diff line number Diff line change
@@ -1,12 +1,21 @@
import { expect, test } from "vitest";
import {
responseAdminClient,
responseClient,
compareSchema,
compareFunctionSpec,
} from "../../../grader";
import { api } from "./answer/convex/_generated/api";
import { siteUrl } from "../../../grader";

test("compare function spec", async ({ skip }) => {
await compareFunctionSpec(skip);
test("GET /api/hello returns body + 'there' (empty body => 'there')", async () => {
const res = await fetch(`${siteUrl}/api/hello`, { method: "GET" });
expect(res.ok).toBe(true);
const text = await res.text();
expect(text).toBe("there");
const contentType = res.headers.get("content-type");
expect(contentType && contentType.includes("text/plain")).toBe(true);
});

test("POST /api/messages/* returns empty response body", async () => {
const res = await fetch(`${siteUrl}/api/messages/123`, {
method: "POST",
body: "ignored",
});
expect(res.ok).toBe(true);
const text = await res.text();
expect(text).toBe("");
});
Loading