AutoSxS Movel Evaluation for LLMs #443

BenoitDherin · 2024-04-17T00:52:29Z

No description provided.

review-notebook-app · 2024-04-17T00:52:34Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

takumiohym · 2024-04-17T14:35:10Z

notebooks/vertex_genai/solutions/vertex_llm_evaluation.ipynb

@@ -0,0 +1,1055 @@
+{


I think the file name should include 'AutoSxS' to differentiate from other evaluation services.

Reply via ReviewNB

takumiohym · 2024-04-17T14:35:10Z

notebooks/vertex_genai/solutions/vertex_llm_evaluation.ipynb

@@ -0,0 +1,1055 @@
+{


Line #9. BUCKET_URI = f"gs://{BUCKET}/autosxs-{UUID}"
Minor: Maybe we can use a fixed name for simplicity as the other labs. Then we can remove the function, and no need to import string standard library.

Reply via ReviewNB

Done! Replaced with a timestamp though to avoid pipeline runs to overwrite in the same directory and simplify the logic.

New logic:

timestamp = str(current_datetime.timestamp()).replace(".","") display_name = f"autosxs-{timestamp}" pipeline_root = os.path.join("gs://", BUCKET, display_name)

takumiohym · 2024-04-17T14:35:11Z

notebooks/vertex_genai/solutions/vertex_llm_evaluation.ipynb

@@ -0,0 +1,1055 @@
+{


"with "response_a" and "response_b" representing different article summaries." -> "with "response_a" and "response_b" representing different article summaries generated from different LLM models." for more clarity?

Reply via ReviewNB

takumiohym · 2024-04-17T14:35:11Z

notebooks/vertex_genai/solutions/vertex_llm_evaluation.ipynb

@@ -0,0 +1,1055 @@
+{


{task}@{version} -> {task} ?

Adding the expected parameters under autorater_prompt_parameters would be helpful.

Also, I'd add that we can simply specify models (model_a and model_b ) instead of specifying pre-generated responses (response_column_a and response_column_b).

Reply via ReviewNB

takumiohym · 2024-04-17T14:35:11Z

notebooks/vertex_genai/solutions/vertex_llm_evaluation.ipynb

@@ -0,0 +1,1055 @@
+{


Line #4. "id_columns": ["id", "document"],
I guess only id is sufficient if it is truly a unique id.

Reply via ReviewNB

Agreeing. This is weird. I'll try with id only see if that works. Otherwise, I'll revert.

takumiohym · 2024-04-17T14:35:11Z

notebooks/vertex_genai/solutions/vertex_llm_evaluation.ipynb

@@ -0,0 +1,1055 @@
+{


for details in job.task_details: if details.task_name == "model-evaluation-text-generation-pairwise": break

Same as the comment above.

online_eval_task = [task for task in job.task_details if task.task_name=="online-evaluation-pairwise"][0]

Reply via ReviewNB

That's nicer. Thanks. Done!

takumiohym

Left minor comments, but LGTM otherwise. Very simple and good lab!

BenoitDherin added 2 commits April 17, 2024 00:41

precommit

4cbbebf

precommit

e637a7a

BenoitDherin requested a review from takumiohym April 17, 2024 00:52

BenoitDherin self-assigned this Apr 17, 2024

takumiohym reviewed Apr 17, 2024

View reviewed changes

takumiohym approved these changes Apr 17, 2024

View reviewed changes

BenoitDherin added 2 commits April 17, 2024 20:10

incorporate Takumi comments

a570cdf

final cleanup

0a5670b

BenoitDherin merged commit 9fac11e into master Apr 17, 2024
5 checks passed

BenoitDherin deleted the autosxs-benoit-dev branch April 17, 2024 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoSxS Movel Evaluation for LLMs #443

AutoSxS Movel Evaluation for LLMs #443

BenoitDherin commented Apr 17, 2024

review-notebook-app bot commented Apr 17, 2024

takumiohym Apr 17, 2024 •

edited

Loading

takumiohym Apr 17, 2024 •

edited

Loading

BenoitDherin Apr 17, 2024

BenoitDherin Apr 17, 2024

takumiohym Apr 17, 2024 •

edited

Loading

BenoitDherin Apr 17, 2024

takumiohym Apr 17, 2024 •

edited

Loading

BenoitDherin Apr 17, 2024

takumiohym Apr 17, 2024 •

edited

Loading

BenoitDherin Apr 17, 2024

takumiohym Apr 17, 2024 •

edited

Loading

BenoitDherin Apr 17, 2024

takumiohym left a comment

AutoSxS Movel Evaluation for LLMs #443

AutoSxS Movel Evaluation for LLMs #443

Conversation

BenoitDherin commented Apr 17, 2024

review-notebook-app bot commented Apr 17, 2024

takumiohym Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

takumiohym Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

BenoitDherin Apr 17, 2024

Choose a reason for hiding this comment

BenoitDherin Apr 17, 2024

Choose a reason for hiding this comment

takumiohym Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

BenoitDherin Apr 17, 2024

Choose a reason for hiding this comment

takumiohym Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

BenoitDherin Apr 17, 2024

Choose a reason for hiding this comment

takumiohym Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

BenoitDherin Apr 17, 2024

Choose a reason for hiding this comment

takumiohym Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

BenoitDherin Apr 17, 2024

Choose a reason for hiding this comment

takumiohym left a comment

Choose a reason for hiding this comment

takumiohym Apr 17, 2024 •

edited

Loading

takumiohym Apr 17, 2024 •

edited

Loading

takumiohym Apr 17, 2024 •

edited

Loading

takumiohym Apr 17, 2024 •

edited

Loading

takumiohym Apr 17, 2024 •

edited

Loading

takumiohym Apr 17, 2024 •

edited

Loading