Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation function based on LLM grading #7

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

msaelices
Copy link

Changes

  • New is_correct() evaluation function, which asks an LLM model to return if a response is correct

Proof-of-life

2023-05-24_12-35

@edwardmfho
Copy link

Test-ran the example cases using gpt-3.5-turbo instead of the default gpt-4 (still waiting for the API). It is a good function to be added in.

Copy link

@edwardmfho edwardmfho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@mistercrunch
Copy link
Member

mistercrunch commented Jun 19, 2023

I like the idea, the only thing is the implementation is very openai-specific, which is probably fine as long as we make it clear. How about we break down evals.py into evals/__init__.py and evals/openai.py.

Goal would be to import is_correct from promptimize.evals.openai

@edwardmfho
Copy link

Should we begin with some sort of base eval function/class that could be used for other LLMs?

@lain5etf7w
Copy link

[email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants