added framework to score evaluation metrics #33

oindrillac · 2024-02-05T20:40:07Z

To drill down on the best genai evaluation criteria, added a framework to obtain a quantitative evaluation matrix to determine how often these scores are valid by

Looking at cases where we know the generated output is deliberately wrong and see how the allotted scores perform and comparing them against human eval scores
And doing this over a number of output for each criteria

I added 6 examples, @hemajv @aakankshaduggal feel free to follow the same example structure and append to the dataframe with more examples, the dataframe can be imported from a pickle file in the same folder.

review-notebook-app · 2024-02-05T20:40:12Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

hemajv

@oindrillac The additions look great 🎉 Left a few comments mainly around structuring the notebook

hemajv · 2024-02-06T16:56:42Z

notebooks/evaluation/evaluation_metrics.ipynb

+   "id": "b74e0295-9269-4dd4-8e38-b6bb22c679db",
+   "metadata": {},
+   "source": [
+    "## Quantitative Evaluation"


Should we maybe move the quantitative evaluation along with the examples to a new notebook? This notebook seems to be getting quite lengthy

hemajv · 2024-02-06T16:57:57Z

notebooks/evaluation/evaluation_metrics.ipynb

+   },
+   "outputs": [],
+   "source": [
+    "def get_response(model_id, file, functions, classes, documentation, imports, other, functions_code, functions_doc, classes_code, classes_doc):\n",


we can perhaps move these 3 functions to a helper_functions.ipynb and import from there

hemajv · 2024-02-06T17:04:04Z

notebooks/evaluation/evaluation_metrics.ipynb

+   "id": "812910d9-9b4f-4430-bffe-d58bb4b67083",
+   "metadata": {},
+   "source": [
+    "## Copy this section, modify and run from here"


Maybe we can wrap all the steps being done here into a function (and include it in helper_functions.ipynb) and invoke this function to run each example, wdyt?

hemajv

Approving this, since we are addressing the refactoring comments in a separate PR

added framework to score evaluation metrics

dbe271c

oindrillac requested review from hemajv and aakankshaduggal February 5, 2024 20:40

hemajv requested changes Feb 6, 2024

View reviewed changes

hemajv approved these changes Feb 7, 2024

View reviewed changes

hemajv merged commit ae8e26b into redhat-et:main Feb 7, 2024
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added framework to score evaluation metrics #33

added framework to score evaluation metrics #33

oindrillac commented Feb 5, 2024

review-notebook-app bot commented Feb 5, 2024

hemajv left a comment •

edited

Loading

hemajv Feb 6, 2024

hemajv Feb 6, 2024

hemajv Feb 6, 2024

hemajv left a comment

added framework to score evaluation metrics #33

added framework to score evaluation metrics #33

Conversation

oindrillac commented Feb 5, 2024

review-notebook-app bot commented Feb 5, 2024

hemajv left a comment • edited Loading

Choose a reason for hiding this comment

hemajv Feb 6, 2024

Choose a reason for hiding this comment

hemajv Feb 6, 2024

Choose a reason for hiding this comment

hemajv Feb 6, 2024

Choose a reason for hiding this comment

hemajv left a comment

Choose a reason for hiding this comment

hemajv left a comment •

edited

Loading