Use config dict in TaskConfigLogger for easier serialization #454

albertvillanova · 2024-12-18T09:58:20Z

Use config dict (instead of dataclass) in TaskConfigLogger for easier serialization (e.g. json.dump).

Note that dataclasses are not JSON serializable out of the box.

HuggingFaceDocBuilderDev · 2024-12-18T10:00:20Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

NathanHB

Thanks ! I thought we wrote a json serializer for dataclasses. Also, was this causing issues ?

NathanHB · 2024-12-19T12:37:13Z

src/lighteval/logging/info_loggers.py

@@ -35,7 +35,7 @@
 from lighteval.metrics.stderr import get_stderr_function
 from lighteval.models.abstract_model import ModelInfo
 from lighteval.models.model_output import ModelResponse
-from lighteval.tasks.lighteval_task import LightevalTask, LightevalTaskConfig


are we using the LightevalTaskConfig anywhere else ?

clefourrier · 2024-12-19T19:29:15Z

I'm quite against going back from dataclass to dict

albertvillanova · 2024-12-20T07:40:00Z

Let me try to explain my point:

the most common use case of the dataclasses.asdict utility function is serialization: convert a dataclass instance to a JSON-like dictionary for serialization, e.g., when sending data over a network or storing it in a database,...
the main purpose of our loggers is indeed to prepare all the evaluation data for serialization (done by EvaluationTracker)

In my opinion, converting dataclass instances to dicts is a common and recommended approach for logging and (later serializing) them.

…fig-logger

albertvillanova · 2024-12-20T15:04:30Z

Let me give another example: in my opinion, a user would expect to be able to serialize the results obtained from the Pipeline:

results = pipeline.get_results()
results_json = json.dumps(results)

However the code above raises an error:

TypeError: Object of type LightevalTaskConfig is not JSON serializable

The user needs a custom serializer.

See this code in the leaderboard, which is no longer valid:
https://huggingface.co/spaces/demo-leaderboard-backend/backend/blob/5c763da56426909e1b3d01e88d9b7382b9287a8a/src/backend/run_eval_suite_lighteval.py#L96-L98

The PR addresses this issue.

albertvillanova · 2024-12-24T06:29:26Z

I'm quite against going back from dataclass to dict

I've been considering the use of dataclasses inside the logged information and would like to understand more about the rationale behind this choice. It would be helpful to know the considerations or advantages that led to this decision. Could you please share some insights or context about the thought process involved? Thanks.

CC: @clefourrier

clefourrier · 2024-12-26T08:29:40Z

Ofc, and thanks for your very detailed answers!

We went the dataclass way first to constrain keys used (without having to add a lot of post init logic as you would for a dict), get implicit typing about said keys, and to help us when coding with IDEs as we now get autocompletion.

We had added a dataclass serializer to json afterwards and changed all the dict classes we were using to dataclasses and it indeed made coding way faster.

Use config asdict in TaskConfigLogger

146b307

albertvillanova added 3 commits December 18, 2024 11:51

Fix tasks_configs name in docstring

ead56f1

Rename tasks_configs to task_configs

5ac5646

Merge branch 'main' into task-config-logger

c30727a

NathanHB reviewed Dec 19, 2024

View reviewed changes

Merge branch 'main' of github.com:huggingface/lighteval into task-con…

d79aaa8

…fig-logger

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use config dict in TaskConfigLogger for easier serialization #454

Use config dict in TaskConfigLogger for easier serialization #454

albertvillanova commented Dec 18, 2024

HuggingFaceDocBuilderDev commented Dec 18, 2024

NathanHB left a comment

NathanHB Dec 19, 2024

clefourrier commented Dec 19, 2024

albertvillanova commented Dec 20, 2024

albertvillanova commented Dec 20, 2024 •

edited

Loading

albertvillanova commented Dec 24, 2024 •

edited

Loading

clefourrier commented Dec 26, 2024

Use config dict in TaskConfigLogger for easier serialization #454

Are you sure you want to change the base?

Use config dict in TaskConfigLogger for easier serialization #454

Conversation

albertvillanova commented Dec 18, 2024

HuggingFaceDocBuilderDev commented Dec 18, 2024

NathanHB left a comment

Choose a reason for hiding this comment

NathanHB Dec 19, 2024

Choose a reason for hiding this comment

clefourrier commented Dec 19, 2024

albertvillanova commented Dec 20, 2024

albertvillanova commented Dec 20, 2024 • edited Loading

albertvillanova commented Dec 24, 2024 • edited Loading

clefourrier commented Dec 26, 2024

albertvillanova commented Dec 20, 2024 •

edited

Loading

albertvillanova commented Dec 24, 2024 •

edited

Loading