Update eval_config.json #24

jz3280 · 2024-10-29T18:55:58Z

No description provided.

jz3280 · 2024-10-29T18:56:21Z

/evaluate

github-actions · 2024-10-29T18:56:36Z

Starting evaluation! Check the Actions tab for progress, or wait for a comment with the results.

github-actions · 2024-10-29T19:02:16Z

Evaluation results

metric	stat	baseline	pr24
gpt_groundedness	mean_rating	5.0	5.0
↑	pass_rate	1.0	1.0
gpt_relevance	mean_rating	4.95	4.95
↑	pass_rate	1.0	1.0
f1_score	mean	0.43	0.43
answer_length	mean	609.1	625.75
latency	mean	2.67	2.43
citations_matched	mean	0.63	0.63

Check the workflow run for more details.

pamelafox · 2024-10-31T22:43:57Z

Thanks for coming and PRing!

For your reference, here are my slides:
https://speakerdeck.com/pamelafox/github-universe-evaluating-rag-apps-in-github-actions

And the main repo this was based on:
https://github.com/pamelafox/rag-postgres-openai-python/
with evaluation guide here:
https://github.com/Azure-Samples/rag-postgres-openai-python/blob/main/docs/evaluation.md

To evaluate on Azure,
Azure AI Eval SDK docs are here:
https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/evaluate-sdk
And if you're interested in the Azure AI CI/CD private preview, sign up here:
https://aka.ms/genAI-CI-CD-private-preview

Update eval_config.json

022fc6b

pamelafox closed this Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update eval_config.json #24

Update eval_config.json #24

Uh oh!

jz3280 commented Oct 29, 2024

Uh oh!

jz3280 commented Oct 29, 2024

Uh oh!

github-actions bot commented Oct 29, 2024

Uh oh!

github-actions bot commented Oct 29, 2024

Uh oh!

pamelafox commented Oct 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update eval_config.json #24

Update eval_config.json #24

Uh oh!

Conversation

jz3280 commented Oct 29, 2024

Uh oh!

jz3280 commented Oct 29, 2024

Uh oh!

github-actions bot commented Oct 29, 2024

Uh oh!

github-actions bot commented Oct 29, 2024

Evaluation results

Uh oh!

pamelafox commented Oct 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants