You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm wondering for each backend and each dataset, do they support saving iteration state and resume later to continue previous iteration where it stopped?
This feature is required for resuming from a checkpoint during model training.
for example:
dataloader=jdl.DataLoader(ds, backend, shuffle=True)
fori, batchinenumerate(dataloader):
ifi==100:
state=dataloader.state_dict()
# re-init the dataloader, and then try to resume from statedataloader=jdl.DataLoader(ds, backend, shuffle=True)
dataloader.load_state_dict(state)
forbatchinenumerate(dataloader):
....
The text was updated successfully, but these errors were encountered:
Hi @pluiez All dataloader backends do not assume the statefulness. For now, you should manually store the states (e.g., batch index, arguments, seeds, etc), and load the states by re-init the dataloader and iterate to the target batch index.
I am not planning to support load_state_dict as it is not a part of API in the (official) pytorch dataloader. I might consider supporting statefuldataloader in the future, but this is still an experimental feature from pytorch team.
Hi @pluiez All dataloader backends do not assume the statefulness. For now, you should manually store the states (e.g., batch index, arguments, seeds, etc), and load the states by re-init the dataloader and iterate to the target batch index.
I am not planning to support load_state_dict as it is not a part of API in the (official) pytorch dataloader. I might consider supporting statefuldataloader in the future, but this is still an experimental feature from pytorch team.
I have checked statefuldataloader implementation, it does support resuming by simply skipping the same number of yielded batches before checkpointing. This is an available though a bit inefficient solution. I'm looking for good practice to support a memory efficient and fast dataset iteration supporting shuffling and resumable iteration. Thank you for your information.
Hi, I'm wondering for each backend and each dataset, do they support saving iteration state and resume later to continue previous iteration where it stopped?
This feature is required for resuming from a checkpoint during model training.
for example:
The text was updated successfully, but these errors were encountered: