Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoSxS Movel Evaluation for LLMs #443

Merged
merged 4 commits into from
Apr 17, 2024
Merged

AutoSxS Movel Evaluation for LLMs #443

merged 4 commits into from
Apr 17, 2024

Conversation

BenoitDherin
Copy link
Collaborator

No description provided.

@BenoitDherin BenoitDherin requested a review from takumiohym April 17, 2024 00:52
@BenoitDherin BenoitDherin self-assigned this Apr 17, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -0,0 +1,1055 @@
{
Copy link
Collaborator

@takumiohym takumiohym Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the file name should include 'AutoSxS' to differentiate from other evaluation services.


Reply via ReviewNB

@@ -0,0 +1,1055 @@
{
Copy link
Collaborator

@takumiohym takumiohym Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #9.    BUCKET_URI = f"gs://{BUCKET}/autosxs-{UUID}"

Minor: Maybe we can use a fixed name for simplicity as the other labs. Then we can remove the function, and no need to import string standard library.


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Replaced with a timestamp though to avoid pipeline runs to overwrite in the same directory and simplify the logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New logic:

timestamp = str(current_datetime.timestamp()).replace(".","")
display_name = f"autosxs-{timestamp}"
pipeline_root = os.path.join("gs://", BUCKET, display_name)

@@ -0,0 +1,1055 @@
{
Copy link
Collaborator

@takumiohym takumiohym Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"with "response_a" and "response_b" representing different article summaries." -> "with "response_a" and "response_b" representing different article summaries generated from different LLM models." for more clarity?


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -0,0 +1,1055 @@
{
Copy link
Collaborator

@takumiohym takumiohym Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{task}@{version} -> {task} ?

Adding the expected parameters under autorater_prompt_parameters would be helpful.

Also, I'd add that we can simply specify models (model_a and model_b ) instead of specifying pre-generated responses (response_column_a and response_column_b).


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -0,0 +1,1055 @@
{
Copy link
Collaborator

@takumiohym takumiohym Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #4.        "id_columns": ["id", "document"],

I guess only id is sufficient if it is truly a unique id.


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreeing. This is weird. I'll try with id only see if that works. Otherwise, I'll revert.

@@ -0,0 +1,1055 @@
{
Copy link
Collaborator

@takumiohym takumiohym Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for details in job.task_details:
    if details.task_name == "model-evaluation-text-generation-pairwise":
        break

Same as the comment above.

online_eval_task = [task for task in job.task_details if task.task_name=="online-evaluation-pairwise"][0]

Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's nicer. Thanks. Done!

Copy link
Collaborator

@takumiohym takumiohym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left minor comments, but LGTM otherwise. Very simple and good lab!

@BenoitDherin BenoitDherin merged commit 9fac11e into master Apr 17, 2024
5 checks passed
@BenoitDherin BenoitDherin deleted the autosxs-benoit-dev branch April 17, 2024 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants