Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* initial work on subtasks * update readme * remove readme (it's now in the PR writeup) * skip tool store test if no openai * fix typo * correct message reading * correct type * Proof of concept for JSDoc Styles Baseline configuration with baseline implementation in a few places * Use yarn to manage preact / htm This allows the types to flow from package.json * Fully type util as proof of concept * Cleanup in utils * Proof of concept using types from log.d.ts * Rough in of transcript dump * Try to reign in type checking for prism here * update to latest prettier * Conditionally show transcript tab * Another solver rendering * Move transcript views * Including TZ information in timestamp * Tweaked step implementation * Add tools to model dump * Revise model event view (still WIP) * More structured transcript * A little more air * A little more tweaking * fix prettier complaint * trying updating yarn.lock * Attempt to force resolution * Remove `json-schema-to-typescript` This is causing a failure in resolving dependencies which prevents yarn install from working. * Improved state change view * Further fine tuning of appearance * Ensure store support too * More standard appearance * Fix content size * Improve appearance * Properly render objects and arrays in state changes * Improve grid appearance * Remove unused imports * Correct subtask inline display * Simplify state change event rendering * Fix prettier * Share event layout * Improve logger event view * add ScoreEvent * remove unused var * track state changes in transcript steps * remove subtask stuff from web_search for now * Improve state changes rendering * Remove logger event title for more compactness (also includes improvements to the transcript event view itself) * Add a scorer event handler * Improve subtask rendering * fix heading cursor * Improve Score Event View * merge from main * turn event into a base class * don't export event types * regen schema * fixup imports * revert event type changes * write schema to types dir * transcript events: no export and standard 'data' field * regen schema * fix transcript test * use pydantic 2.0 field_serialiser * Revert "transcript events: no export and standard 'data' field" This reverts commit 5f2b654. * use pydantic 2.0 field_serialiser * don't export events * remove unused import * log the full logging message * rename log method * drive transcript events into recorder * write through to trsc file * cleaner interface for transcript event forwarding * initial write to sqlite * Standardize font-size on rem * decorate the html tag for the logview so it can detect vscode immediately * Improve column allocation * Create Shared Fonts Object Move all things to a shared notion of fonts and styles that can be re-used easily. Use font scaling in vscode to achieve the correct appearance (now that we’re rem based we can just change the base font size). * Move summary stats into navbar * Restructure navbar into workspace * Improve progress bar appearance * Improve column sizing * Refactor tab appearance into navbar * Adjust correct/incorrect appearance * Baseline pill improvements * fix heading height in vscode * correct sidebar * Improve sidebar appearance (+prettier) * widen sidebar slightly * Sample Display Tweaks * Tweaks to config * initial work on content db * more comprehensive event walking * de-duplicate content in transcript events * Remove timestamps, correct prop name * Baseline implementation of evalevents resolving plus some prettier formatting * Correct content resolution * remove logging section of evallog (now covered in sample transcript) * Improve density when hosted in vscode at narrow short sizes * Revised appearance to grouped cards * formatting * A little more tweakage * generate_loop and tool_call steps * Fix lint issues * no srsly fix lint * resolve circular import * run prettier on event panel * Fix error w/specific logs * update test * Improve find band appearance * sample init event * Proof of concept state rendering * Relocate state code since it will grow * correct resolution of objects * lint and formatting * sample_init event * Add collapsible state to event panel, collapse certain panels * Subtask rough in * ensure we have vite * Correct merge + regen dist * add a watch command * correct formatting * correct build (investigating why my local build wasn’t flagging this) * include source maps * Add Sample Init, track state across transcript * fix lint * update dist * ensure nav-pills align bottom * correct lint * Add chat view to model * prettier * ran commands in wrong order * Improve sample init event (still mostly a dump of data) * Add all messages view * Simplify transcript view * Improvements to display * Chatview should show tool call even if no message * Improve state event display * Display choices in sampleinit event * Improve Score Event appearance * Tweak title results * More appearance tweakage * Improve tab appearance * Fix tab selection issue in subtask transcripts * Improved spacing * Fix scoring metadata layout * toolcall event * initial work on raw request/response for model event * Add placeholder tool event * initial work on raw model calls * log raw tool argument not converted * log raw model call for anthropic * format attribs notebook * raw model request for mistral * Add depth to cards (with basic impl) * remove map * ignore source map * Add baseline model raw view * Improve state appearance * Improve log display * fix formatting * properly default to messages if no transcript * add one last debug message * Disable build checking with note * Appearance refinement - only start indenting at second level step - create section component * raw capture for google * Don’t capture args when logging This is doing a lot of work which shouldn’t be happening in a log handler (and the value of the args is suspect anyhow). Causing an exception in certain environments. * Remove disused imports * record raw api calls for groq * Improve root solver display - break up root cards - add sample init step (synthetic) * raw api call recording * raw model access for cloudflare * raw model output for azureai * Improve subtask display * raw model capture for vertex * eliminate qualifying note on tool descriptions * improve setup display * Add ToolView * improved agents api docs * Tweaks * eliminate tool steps * hide agents api for now * agents api docs * Resolve the model call contents * Special case handling for sample init event title (no dupe title) * Improve logging event appearance * more tool docs * rename to agents api * remove bash prompt * Correct transcript icons * improve tab selection behavior * Improved model display * Correct font size in metadatagrid * initial work on tool transcript * more tool event work * schema updates * Refactor content resolution to the very top level (move it out of transcript view - it expects to receive events with resolve content) * Resolve the whole state object for events * remove generate_loop step type * Fix ruff errors * don’t force wrap in weird ways * Correct tool rendering for state events * Baseline visual diff implementation * Move tools to single transcript tab * Improve tool event output rendering * Don’t intend tool input / output * enable docs + add parallel execution * Fix prism coloring for js, python, json * show no output message if there is not tool output * allow event titles to wrap * Improve wrapping at small sizes (model event) * crossref to agents api article --------- Co-authored-by: aisi-inspect <[email protected]> Co-authored-by: Charles Teague <[email protected]> Co-authored-by: jjallaire-aisi <[email protected]>
- Loading branch information