Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added framework to score evaluation metrics #33

Merged
merged 1 commit into from
Feb 7, 2024
Merged

added framework to score evaluation metrics #33

merged 1 commit into from
Feb 7, 2024

Conversation

oindrillac
Copy link
Contributor

To drill down on the best genai evaluation criteria, added a framework to obtain a quantitative evaluation matrix to determine how often these scores are valid by

  • Looking at cases where we know the generated output is deliberately wrong and see how the allotted scores perform and comparing them against human eval scores
  • And doing this over a number of output for each criteria

I added 6 examples, @hemajv @aakankshaduggal feel free to follow the same example structure and append to the dataframe with more examples, the dataframe can be imported from a pickle file in the same folder.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Collaborator

@hemajv hemajv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oindrillac The additions look great 🎉 Left a few comments mainly around structuring the notebook

"id": "b74e0295-9269-4dd4-8e38-b6bb22c679db",
"metadata": {},
"source": [
"## Quantitative Evaluation"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe move the quantitative evaluation along with the examples to a new notebook? This notebook seems to be getting quite lengthy

},
"outputs": [],
"source": [
"def get_response(model_id, file, functions, classes, documentation, imports, other, functions_code, functions_doc, classes_code, classes_doc):\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can perhaps move these 3 functions to a helper_functions.ipynb and import from there

"id": "812910d9-9b4f-4430-bffe-d58bb4b67083",
"metadata": {},
"source": [
"## Copy this section, modify and run from here"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can wrap all the steps being done here into a function (and include it in helper_functions.ipynb) and invoke this function to run each example, wdyt?

Copy link
Collaborator

@hemajv hemajv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this, since we are addressing the refactoring comments in a separate PR

@hemajv hemajv merged commit ae8e26b into redhat-et:main Feb 7, 2024
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants