Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimizer = Ranger21(params=model.parameters(), lr=learning_rate) File "/mnt/Drive1/florian/msblob/Ranger21/ranger21/ranger21.py", line 179, in __init__ self.total_iterations = num_epochs * num_batches_per_epoch TypeError: unsupported operand type(s) for *: 'NoneType' and 'NoneType' #12

Open
neuronflow opened this issue Jun 27, 2021 · 6 comments

Comments

@neuronflow
Copy link

I get the following error when starting my training:

Traceback (most recent call last):
  File "tr_baseline.py", line 75, in <module>
    optimizer = Ranger21(params=model.parameters(), lr=learning_rate)
  File "/mnt/Drive1/florian/msblob/Ranger21/ranger21/ranger21.py", line 179, in __init__
    self.total_iterations = num_epochs * num_batches_per_epoch
TypeError: unsupported operand type(s) for *: 'NoneType' and 'NoneType'

initializing ranger with:

# ranger:
optimizer = Ranger21(params=model.parameters(), lr=learning_rate)
@saruarlive
Copy link

Well, have you tried as shown below,

from ranger21 import Ranger21 
optimizer = Ranger21(model.parameters(), lr = 1e-02, num_epochs = epochs, num_batches_per_epoch = len(train_loader))

@lessw2020
Copy link
Owner

Hi @neuronflow,
@saruarlive is correct - the issue is we need to know how many epochs and how many iterations per epoch in order to auto-compute the lr schedule.
Clearly our error handling should be improved to make it clear the issue (I thought we were checking for this case) but from the error listed above, it's basically saying the num_epochs = None, num_batches_per_epoch=None, and it can't do any math with it.
I'll leave this open until I verify and add some better error handling, but the core issue is you need to pass in the total epochs and num_iterations (and we need to document this better).

@neuronflow
Copy link
Author

thank you, with the above the training seems to start until it's crashing with this error:

  File "/mnt/Drive3/florian/multi_patch_blob_loss/neuronflow/training/epoch/trainEpoch.py", line 77, in train_epoch
    optimizer.step()
  File "/home/florian/miniconda3/envs/msblob/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/Drive1/florian/msblob/Ranger21/ranger21/ranger21.py", line 570, in step
    self.agc(p)
  File "/mnt/Drive1/florian/msblob/Ranger21/ranger21/ranger21.py", line 398, in agc
    p_norm = self.unit_norm(p).clamp_(self.agc_eps)
  File "/mnt/Drive1/florian/msblob/Ranger21/ranger21/ranger21.py", line 382, in unit_norm
    raise ValueError(
ValueError: unit_norm:: adaptive gclipping: unable to process len of 5 - currently must be <= 4

If I understand it correctly ranger21 contains a lr scheduler, so it does not sense to combine it with cosine annealing and warm restarts?

@lessw2020
Copy link
Owner

Hi @neuronflow,
The valueerror above comes from having 4 or more dimensions ala 3D convolutions.
If you pull the latest version that I posted last week then it adaptive clipping will handle any size dimensions so that is resolved.

To your other point - by default Ranger21 will handle the lr scheduling internally for you, so you would not want to use with cosine annealing or other lr scheduling.
You can of course turn off internal lr scheduling if you want to compare using Ranger21 internal scheduling vs your own scheduler...I wouldn't recommend it since there's a lot of validation behind the schedule Ranger21 sets, but certainly you can test it out to see.
You can turn off scheduling by removing the warmup:
use_warmup=False,
and the warmdown:
warmdown_active=False,

I can see that it might be simpler if had a single use_lr_scheduling = True/False so I think I'll add that soon...but for now, turning warmup and warmdown off will have r21 operate as an optimizer with no scheduling, and then you can drive the lr with your own schedule.
Hope that helps!

@neuronflow
Copy link
Author

thank you once again, for the fast and detailed response with the latest update it seems to work! :)

@neuronflow
Copy link
Author

One further question, I have a training where I use multiple training data loaders with different batch length..is it possible to apply ranger21 in this context?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants