iteration resuming #37

pluiez · 2024-10-30T04:12:22Z

Hi, I'm wondering for each backend and each dataset, do they support saving iteration state and resume later to continue previous iteration where it stopped?

This feature is required for resuming from a checkpoint during model training.

for example:

dataloader = jdl.DataLoader(ds, backend, shuffle=True)
for i, batch in enumerate(dataloader):
     if i == 100:
         state = dataloader.state_dict()
# re-init the dataloader, and then try to resume from state
dataloader = jdl.DataLoader(ds, backend, shuffle=True)
dataloader.load_state_dict(state)

for batch in enumerate(dataloader):
    ....

BirkhoffG · 2024-10-30T14:51:56Z

Hi @pluiez All dataloader backends do not assume the statefulness. For now, you should manually store the states (e.g., batch index, arguments, seeds, etc), and load the states by re-init the dataloader and iterate to the target batch index.

I am not planning to support load_state_dict as it is not a part of API in the (official) pytorch dataloader. I might consider supporting statefuldataloader in the future, but this is still an experimental feature from pytorch team.

pluiez · 2024-11-11T05:52:48Z

Hi @pluiez All dataloader backends do not assume the statefulness. For now, you should manually store the states (e.g., batch index, arguments, seeds, etc), and load the states by re-init the dataloader and iterate to the target batch index.

I am not planning to support load_state_dict as it is not a part of API in the (official) pytorch dataloader. I might consider supporting statefuldataloader in the future, but this is still an experimental feature from pytorch team.

I have checked statefuldataloader implementation, it does support resuming by simply skipping the same number of yielded batches before checkpointing. This is an available though a bit inefficient solution. I'm looking for good practice to support a memory efficient and fast dataset iteration supporting shuffling and resumable iteration. Thank you for your information.

BirkhoffG added the enhancement New feature or request label Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iteration resuming #37

iteration resuming #37

pluiez commented Oct 30, 2024

BirkhoffG commented Oct 30, 2024

pluiez commented Nov 11, 2024

iteration resuming #37

iteration resuming #37

Comments

pluiez commented Oct 30, 2024

BirkhoffG commented Oct 30, 2024

pluiez commented Nov 11, 2024