Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Fix UserWarning: The torch.cuda.*DtypeTensor constructors are… #5165

Closed
wants to merge 1 commit into from

Conversation

loadams
Copy link
Contributor

@loadams loadams commented Feb 20, 2024

Reverts 177dc14331a64e61f6dcce2c4b8071576bcb22db since it breaks tests in Megatron-DeepSpeed.

@loadams loadams requested a review from lekurile February 20, 2024 22:48
@lekurile
Copy link
Contributor

Hello @ShukantPal,

Appreciate the effort on the initial PR fixing the warning. However, this change causes an issue in the Megatron-DeepSpeed type check code here:
https://github.com/microsoft/Megatron-DeepSpeed/blob/d47f3cda3a9316ddc68e7f0ef904d1650ba6419d/megatron/model/module.py#L138

Here's the resulting error:

 File "/Megatron-DeepSpeed/megatron/model/module.py", line 139, in half_conversion
    if isinstance(val_typecheck, _FLOAT_TYPES):
TypeError: isinstance() arg 2 must be a type or tuple of types

Since the CUDA accelerator no longer returns a torch type, but functools.partial:

(Pdb) p _FLOAT_TYPES
(<class 'torch.FloatTensor'>, functools.partial(<built-in method tensor of type object at 0x7f3d941c5420>, dtype=torch.float32, device='cuda'))

We an issue when training using the Megatron-DeepSpeed repo. For now, we're looking to revert this PR and would appreciate any feedback or suggestions from your end.

Thanks,
Lev

@ShukantPal
Copy link
Contributor

Hi @lekurile,

The Megatron-DeepSpeed repo would probably have to change how it detects tensor types to handle this. Probably by calling the tensor factory and checking its dtype; something like this:

        if callable(val_typecheck) and val_typecheck([0]).dtype == torch.float:
            val = float16_convertor(val)
        return val

@lekurile
Copy link
Contributor

Hi @lekurile,

The Megatron-DeepSpeed repo would probably have to change how it detects tensor types to handle this. Probably by calling the tensor factory and checking its dtype; something like this:

        if callable(val_typecheck) and val_typecheck([0]).dtype == torch.float:
            val = float16_convertor(val)
        return val

Thanks for the quick reply! We've created a PR in Megatron-DeepSpeed updating the type check to check against the accelerator specific dtype. We can close this revert PR and keep the changes.
https://github.com/microsoft/Megatron-DeepSpeed/pull/346/files

@loadams
Copy link
Contributor Author

loadams commented Feb 21, 2024

Thanks @ShukantPal - closing in favor of @lekurile's PR in Megatron-DeepSpeed here.

@loadams loadams closed this Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants