Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add smoke test based on examples/samples-readme.md #719

Closed
wants to merge 4 commits into from

Conversation

njhale
Copy link
Member

@njhale njhale commented Aug 6, 2024

  • Add a smoke test based on examples/samples-readme.md
  • Strip call progress type events before LLM comparison (this prevents messages from getting too large and failing OpenAI API validation)
  • Regenerate golden files for existing models to drop callProgress
    events (we weren't comparing these anyway)
  • Add golden files for gpt-4o-2024-08-06
  • Focus comparison on matching event types to reduce false negatives
  • Drop "ignore callProgress" rule (we're eliding them from the event
    stream before sending them to the judge now)
  • Add gpt-4o-2024-08-06 to our smoke test action

@njhale
Copy link
Member Author

njhale commented Aug 6, 2024

Looks like mistral is flaking for this test. I'll check it out in a bit.

StrongMonkey
StrongMonkey previously approved these changes Aug 6, 2024
thedadams
thedadams previously approved these changes Aug 6, 2024
- Regenerate golden files for existing models to drop callProgress
  events (we weren't comparing these anyway)
- Add golden files for `gpt-4o-2024-08-06`

Signed-off-by: Nick Hale <[email protected]>
- Focus comparison on matching event types to reduce false negatives
- Drop "ignore callProgress" rule (we're eliding them from the event
  stream before sending them to the judge now)

Signed-off-by: Nick Hale <[email protected]>
@njhale
Copy link
Member Author

njhale commented Oct 14, 2024

this is stale, will open up a new PR with the test when I get some time this week

@njhale njhale closed this Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants