`Timer`'s misleading behaviour when epoch completion time calculated #2157

Priyansi · 2021-08-11T11:26:38Z

When I tried calculating the time taken to complete a single epoch via Timer, the handlers attached to trainer before Timer were executed first, and thus their time also got recorded by the Timer too for a single epoch. Therefore the true time taken for epoch completion, provided by trainer.state.times, is less than what Timer calculated. This can be misleading. More clarification on how this actually works in the docs would be appreciated. Or the Timer's functionality can be enhanced to step before all other handlers attached to an event could also be helpful. Notebook to quickly verify this here.

The text was updated successfully, but these errors were encountered:

sdesrozis · 2021-08-16T22:49:43Z

The logic is here

ignite/ignite/engine/engine.py

Line 745 in 38f30c4

def _internal_run(self) -> State:

The state timers should return the times for specific events. However, if an handler, during a given event, tries accessing the state timer of this event, the time is not yet computed. Yet the value in the timer is the time spent in the previous events rather than an undefined value.

I agree that it should be explained, or modified. For instance, the state timer of an event could be lazily updated each time it is reached by the user in the event. It could be done but not sure it really worth…

FarehaNousheen · 2021-10-16T16:58:10Z

Hi, I would like to work on this issue, please assign this to me. Also, please provide examples to understand the issue better.

vfdev-5 · 2021-10-18T08:28:58Z

Also, please provide examples to understand the issue better.

@FarehaNousheen please read attentively #2157 (comment). There is a notebook with a concrete example provided by Priyansi

vfdev-5 · 2021-11-08T12:30:13Z

Things to do for this issue:

Reproduce the results from https://colab.research.google.com/drive/1xX7slM9BiDSWYGId5zB_8ISneJ0wezAM
- rerun the notebook and confirm that there is a difference of 15-20 seconds between time measurements for the single epoch. For example:

Time taken by a single epoch calculated by Timer: 197.58
Time Taken for single epoch calculated by State of engine : 170.89924907684326

Modify the notebook such that timer is attached before running the validation:
Current notebook

trainer = create_supervised_trainer(model, optimizer, criterion, device=device)
evaluator = create_supervised_evaluator(
    model, metrics={"accuracy": Accuracy(), "loss": Loss(criterion)}, device=device
)

@trainer.on(Events.EPOCH_COMPLETED)
def log_training_results(trainer):
    evaluator.run(train_loader)
    metrics = evaluator.state.metrics
    print(
        f"Training Results - Epoch[{trainer.state.epoch}] Avg accuracy: {metrics['accuracy']:.2f} Avg loss: {metrics['loss']:.2f}"
    )
    print(
    f"Time taken by a single epoch calculated by Timer: {timer.value():.2f}"
    )
    print(
        f"Time Taken for single epoch calculated by State of engine : {trainer.state.times['EPOCH_COMPLETED']}"
    )


@trainer.on(Events.EPOCH_COMPLETED)
def log_validation_results(trainer):
    evaluator.run(val_loader)
    metrics = evaluator.state.metrics
    print(
        f"Validation Results - Epoch[{trainer.state.epoch}] Avg accuracy: {metrics['accuracy']:.2f} Avg loss: {metrics['loss']:.2f}"
    )
    print(
    f"Time taken by a single epoch calculated by Timer: {timer.value():.2f}"
    )
    print(
        f"Time Taken for single epoch calculated by State of engine : {trainer.state.times['EPOCH_COMPLETED']}"
    )

timer = Timer(average=True)
timer.attach(trainer,
             start=Events.EPOCH_STARTED,
             resume=Events.EPOCH_STARTED,
             pause=Events.EPOCH_COMPLETED,
             step=Events.EPOCH_COMPLETED)

Expected modfication:

trainer = create_supervised_trainer(model, optimizer, criterion, device=device)
evaluator = create_supervised_evaluator(
    model, metrics={"accuracy": Accuracy(), "loss": Loss(criterion)}, device=device
)

timer = Timer(average=True)
timer.attach(trainer,
             start=Events.EPOCH_STARTED,
             resume=Events.EPOCH_STARTED,
             pause=Events.EPOCH_COMPLETED,
             step=Events.EPOCH_COMPLETED)

@trainer.on(Events.EPOCH_COMPLETED)
def log_training_results(trainer):
    evaluator.run(train_loader)
    metrics = evaluator.state.metrics
    print(
        f"Training Results - Epoch[{trainer.state.epoch}] Avg accuracy: {metrics['accuracy']:.2f} Avg loss: {metrics['loss']:.2f}"
    )
    print(
    f"Time taken by a single epoch calculated by Timer: {timer.value():.2f}"
    )
    print(
        f"Time Taken for single epoch calculated by State of engine : {trainer.state.times['EPOCH_COMPLETED']}"
    )


@trainer.on(Events.EPOCH_COMPLETED)
def log_validation_results(trainer):
    evaluator.run(val_loader)
    metrics = evaluator.state.metrics
    print(
        f"Validation Results - Epoch[{trainer.state.epoch}] Avg accuracy: {metrics['accuracy']:.2f} Avg loss: {metrics['loss']:.2f}"
    )
    print(
    f"Time taken by a single epoch calculated by Timer: {timer.value():.2f}"
    )
    print(
        f"Time Taken for single epoch calculated by State of engine : {trainer.state.times['EPOCH_COMPLETED']}"
    )

Re-run the notebook and report here the results for the time measurements for a single epoch
- put the logs here
- put the link on the colab here and make sure that colab is accessible to everyone

FarehaNousheen · 2021-11-15T18:42:08Z

That's a really detailed description. Thank you for sharing it. I'll work on it tomorrow.

FarehaNousheen · 2021-11-24T18:32:28Z

Hi all, I have modified the notebook with the timer attached before running the validation.
Sharing the link of Google colab here
https://colab.research.google.com/drive/1SJYqgmzYvGivruq9OAS0q6MADjBFqI13

FarehaNousheen · 2021-12-07T15:42:53Z

I executed the notebook with the Timer attached before the second code snippet and removed the cell where the Timer is attached after validation. In doing so I got the result that is updated on the copy of the Colab shared above. In short sharing results below.
Training Results - Epoch[1] Avg accuracy: 0.95 Avg loss: 0.16
Time taken by a single epoch calculated by Timer: 73.73
Time Taken for single epoch calculated by State of engine : 73.72484469413757
Validation Results - Epoch[1] Avg accuracy: 0.95 Avg loss: 0.15
Time taken by a single epoch calculated by Timer: 73.73
Time Taken for single epoch calculated by State of engine : 73.72484469413757
Training Results - Epoch[2] Avg accuracy: 0.97 Avg loss: 0.13
Time taken by a single epoch calculated by Timer: 73.51
Time Taken for single epoch calculated by State of engine : 73.5108630657196
Validation Results - Epoch[2] Avg accuracy: 0.96 Avg loss: 0.13
Time taken by a single epoch calculated by Timer: 73.51
Time Taken for single epoch calculated by State of engine : 73.5108630657196

vfdev-5 added the enhancement label Aug 11, 2021

vfdev-5 added the help wanted label Aug 11, 2021

vfdev-5 added the Hacktoberfest label Oct 1, 2021

vfdev-5 assigned FarehaNousheen Oct 18, 2021

vfdev-5 added the PyDataGlobal PyData Global 2020 Sprint label Oct 21, 2021

vfdev-5 linked a pull request May 2, 2022 that will close this issue

Doc fixes #2471

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Timer`'s misleading behaviour when epoch completion time calculated #2157

`Timer`'s misleading behaviour when epoch completion time calculated #2157

Priyansi commented Aug 11, 2021

sdesrozis commented Aug 16, 2021

FarehaNousheen commented Oct 16, 2021

vfdev-5 commented Oct 18, 2021

vfdev-5 commented Nov 8, 2021

FarehaNousheen commented Nov 15, 2021

FarehaNousheen commented Nov 24, 2021 •

edited

Loading

FarehaNousheen commented Dec 7, 2021

Timer's misleading behaviour when epoch completion time calculated #2157

Timer's misleading behaviour when epoch completion time calculated #2157

Comments

Priyansi commented Aug 11, 2021

sdesrozis commented Aug 16, 2021

FarehaNousheen commented Oct 16, 2021

vfdev-5 commented Oct 18, 2021

vfdev-5 commented Nov 8, 2021

FarehaNousheen commented Nov 15, 2021

FarehaNousheen commented Nov 24, 2021 • edited Loading

FarehaNousheen commented Dec 7, 2021

`Timer`'s misleading behaviour when epoch completion time calculated #2157

`Timer`'s misleading behaviour when epoch completion time calculated #2157

FarehaNousheen commented Nov 24, 2021 •

edited

Loading