Revert "Fix UserWarning: The torch.cuda.*DtypeTensor constructors are… #5165

loadams · 2024-02-20T22:48:24Z

Reverts 177dc14331a64e61f6dcce2c4b8071576bcb22db since it breaks tests in Megatron-DeepSpeed.

… no long… (#5018)" This reverts commit 177dc14.

lekurile · 2024-02-20T22:51:48Z

Appreciate the effort on the initial PR fixing the warning. However, this change causes an issue in the Megatron-DeepSpeed type check code here:
https://github.com/microsoft/Megatron-DeepSpeed/blob/d47f3cda3a9316ddc68e7f0ef904d1650ba6419d/megatron/model/module.py#L138

Here's the resulting error:

 File "/Megatron-DeepSpeed/megatron/model/module.py", line 139, in half_conversion
    if isinstance(val_typecheck, _FLOAT_TYPES):
TypeError: isinstance() arg 2 must be a type or tuple of types

Since the CUDA accelerator no longer returns a torch type, but functools.partial:

(Pdb) p _FLOAT_TYPES
(<class 'torch.FloatTensor'>, functools.partial(<built-in method tensor of type object at 0x7f3d941c5420>, dtype=torch.float32, device='cuda'))

We an issue when training using the Megatron-DeepSpeed repo. For now, we're looking to revert this PR and would appreciate any feedback or suggestions from your end.

Thanks,
Lev

ShukantPal · 2024-02-20T22:57:14Z

Hi @lekurile,

The Megatron-DeepSpeed repo would probably have to change how it detects tensor types to handle this. Probably by calling the tensor factory and checking its dtype; something like this:

        if callable(val_typecheck) and val_typecheck([0]).dtype == torch.float:
            val = float16_convertor(val)
        return val

lekurile · 2024-02-21T00:15:54Z

Hi @lekurile,

The Megatron-DeepSpeed repo would probably have to change how it detects tensor types to handle this. Probably by calling the tensor factory and checking its dtype; something like this:
        if callable(val_typecheck) and val_typecheck([0]).dtype == torch.float:
            val = float16_convertor(val)
        return val

Thanks for the quick reply! We've created a PR in Megatron-DeepSpeed updating the type check to check against the accelerator specific dtype. We can close this revert PR and keep the changes.
https://github.com/microsoft/Megatron-DeepSpeed/pull/346/files

loadams · 2024-02-21T00:19:07Z

Thanks @ShukantPal - closing in favor of @lekurile's PR in Megatron-DeepSpeed here.

Revert "Fix UserWarning: The torch.cuda.*DtypeTensor constructors are…

e1aa2b9

… no long… (#5018)" This reverts commit 177dc14.

loadams requested a review from lekurile February 20, 2024 22:48

lekurile approved these changes Feb 20, 2024

View reviewed changes

loadams closed this Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Fix UserWarning: The torch.cuda.*DtypeTensor constructors are… #5165

Revert "Fix UserWarning: The torch.cuda.*DtypeTensor constructors are… #5165

loadams commented Feb 20, 2024

lekurile commented Feb 20, 2024

ShukantPal commented Feb 20, 2024

lekurile commented Feb 21, 2024

loadams commented Feb 21, 2024

Revert "Fix UserWarning: The torch.cuda.*DtypeTensor constructors are… #5165

Revert "Fix UserWarning: The torch.cuda.*DtypeTensor constructors are… #5165

Conversation

loadams commented Feb 20, 2024

lekurile commented Feb 20, 2024

ShukantPal commented Feb 20, 2024

lekurile commented Feb 21, 2024

loadams commented Feb 21, 2024