You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider this as a mix of bug report and PR. Unfortunately, it would take too long for me to cook the toy example to reproduce, and the issue is quite simple and obvious enough.
So, there is a use_gather_object argument in TrainingArguments, allowing to use non-tensors in eval, or tensors with different shapes.
It is handled here:
# create accelerator objectself.accelerator=Accelerator(**args)
# some Trainer classes need to use `gather` instead of `gather_for_metrics`, thus we store a flagself.gather_function=self.accelerator.gather_for_metricsif"use_gather_object"ininspect.signature(self.gather_function).parameters.keys():
self.gather_function=functools.partial(
self.gather_function, use_gather_object=self.args.eval_use_gather_object
)
However, I have noticed, that after the first eval, the 2nd eval is crashing, while trying to concat batches with different shapes, as if the flag eval_use_gather_object stopped working. Indeed it does, because here in evaluation_loop, the self.gather_function is reset, and the .eval_use_gather_object is not used https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L4359
# After all calls to `.gather_function`, reset to `gather_for_metrics`:self.gather_function=self.accelerator.gather_for_metrics
if "use_gather_object" in inspect.signature(self.gather_function).parameters.keys():
self.gather_function = functools.partial(
self.gather_function, use_gather_object=self.args.eval_use_gather_object
)
The text was updated successfully, but these errors were encountered:
Hi,
Consider this as a mix of bug report and PR. Unfortunately, it would take too long for me to cook the toy example to reproduce, and the issue is quite simple and obvious enough.
So, there is a
use_gather_object
argument in TrainingArguments, allowing to use non-tensors in eval, or tensors with different shapes.It is handled here:
https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L5103
However, I have noticed, that after the first eval, the 2nd eval is crashing, while trying to concat batches with different shapes, as if the flag
eval_use_gather_object
stopped working. Indeed it does, because here inevaluation_loop
, theself.gather_function
is reset, and the.eval_use_gather_object
is not usedhttps://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L4359
Suggested fix (I am using it myself): add the same line to use the flag.
https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L4359
The text was updated successfully, but these errors were encountered: