-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use config dict in TaskConfigLogger for easier serialization #454
base: main
Are you sure you want to change the base?
Use config dict in TaskConfigLogger for easier serialization #454
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! I thought we wrote a json serializer for dataclasses. Also, was this causing issues ?
@@ -35,7 +35,7 @@ | |||
from lighteval.metrics.stderr import get_stderr_function | |||
from lighteval.models.abstract_model import ModelInfo | |||
from lighteval.models.model_output import ModelResponse | |||
from lighteval.tasks.lighteval_task import LightevalTask, LightevalTaskConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we using the LightevalTaskConfig anywhere else ?
I'm quite against going back from dataclass to dict |
Let me try to explain my point:
In my opinion, converting dataclass instances to dicts is a common and recommended approach for logging and (later serializing) them. |
Let me give another example: in my opinion, a user would expect to be able to serialize the results obtained from the Pipeline: results = pipeline.get_results()
results_json = json.dumps(results)
See this code in the leaderboard, which is no longer valid: The PR addresses this issue. |
I've been considering the use of dataclasses inside the logged information and would like to understand more about the rationale behind this choice. It would be helpful to know the considerations or advantages that led to this decision. Could you please share some insights or context about the thought process involved? Thanks. CC: @clefourrier |
Ofc, and thanks for your very detailed answers! We went the dataclass way first to constrain keys used (without having to add a lot of post init logic as you would for a dict), get implicit typing about said keys, and to help us when coding with IDEs as we now get autocompletion. We had added a dataclass serializer to json afterwards and changed all the dict classes we were using to dataclasses and it indeed made coding way faster. |
Use config dict (instead of dataclass) in TaskConfigLogger for easier serialization (e.g. json.dump).
Note that dataclasses are not JSON serializable out of the box.