-
-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timer
's misleading behaviour when epoch completion time calculated
#2157
Comments
The logic is here ignite/ignite/engine/engine.py Line 745 in 38f30c4
The state timers should return the times for specific events. However, if an handler, during a given event, tries accessing the state timer of this event, the time is not yet computed. Yet the value in the timer is the time spent in the previous events rather than an undefined value. I agree that it should be explained, or modified. For instance, the state timer of an event could be lazily updated each time it is reached by the user in the event. It could be done but not sure it really worth… |
Hi, I would like to work on this issue, please assign this to me. Also, please provide examples to understand the issue better. |
@FarehaNousheen please read attentively #2157 (comment). There is a notebook with a concrete example provided by Priyansi |
Things to do for this issue:
trainer = create_supervised_trainer(model, optimizer, criterion, device=device)
evaluator = create_supervised_evaluator(
model, metrics={"accuracy": Accuracy(), "loss": Loss(criterion)}, device=device
)
@trainer.on(Events.EPOCH_COMPLETED)
def log_training_results(trainer):
evaluator.run(train_loader)
metrics = evaluator.state.metrics
print(
f"Training Results - Epoch[{trainer.state.epoch}] Avg accuracy: {metrics['accuracy']:.2f} Avg loss: {metrics['loss']:.2f}"
)
print(
f"Time taken by a single epoch calculated by Timer: {timer.value():.2f}"
)
print(
f"Time Taken for single epoch calculated by State of engine : {trainer.state.times['EPOCH_COMPLETED']}"
)
@trainer.on(Events.EPOCH_COMPLETED)
def log_validation_results(trainer):
evaluator.run(val_loader)
metrics = evaluator.state.metrics
print(
f"Validation Results - Epoch[{trainer.state.epoch}] Avg accuracy: {metrics['accuracy']:.2f} Avg loss: {metrics['loss']:.2f}"
)
print(
f"Time taken by a single epoch calculated by Timer: {timer.value():.2f}"
)
print(
f"Time Taken for single epoch calculated by State of engine : {trainer.state.times['EPOCH_COMPLETED']}"
)
timer = Timer(average=True)
timer.attach(trainer,
start=Events.EPOCH_STARTED,
resume=Events.EPOCH_STARTED,
pause=Events.EPOCH_COMPLETED,
step=Events.EPOCH_COMPLETED) Expected modfication: trainer = create_supervised_trainer(model, optimizer, criterion, device=device)
evaluator = create_supervised_evaluator(
model, metrics={"accuracy": Accuracy(), "loss": Loss(criterion)}, device=device
)
timer = Timer(average=True)
timer.attach(trainer,
start=Events.EPOCH_STARTED,
resume=Events.EPOCH_STARTED,
pause=Events.EPOCH_COMPLETED,
step=Events.EPOCH_COMPLETED)
@trainer.on(Events.EPOCH_COMPLETED)
def log_training_results(trainer):
evaluator.run(train_loader)
metrics = evaluator.state.metrics
print(
f"Training Results - Epoch[{trainer.state.epoch}] Avg accuracy: {metrics['accuracy']:.2f} Avg loss: {metrics['loss']:.2f}"
)
print(
f"Time taken by a single epoch calculated by Timer: {timer.value():.2f}"
)
print(
f"Time Taken for single epoch calculated by State of engine : {trainer.state.times['EPOCH_COMPLETED']}"
)
@trainer.on(Events.EPOCH_COMPLETED)
def log_validation_results(trainer):
evaluator.run(val_loader)
metrics = evaluator.state.metrics
print(
f"Validation Results - Epoch[{trainer.state.epoch}] Avg accuracy: {metrics['accuracy']:.2f} Avg loss: {metrics['loss']:.2f}"
)
print(
f"Time taken by a single epoch calculated by Timer: {timer.value():.2f}"
)
print(
f"Time Taken for single epoch calculated by State of engine : {trainer.state.times['EPOCH_COMPLETED']}"
)
|
That's a really detailed description. Thank you for sharing it. I'll work on it tomorrow. |
Hi all, I have modified the notebook with the timer attached before running the validation. |
I executed the notebook with the Timer attached before the second code snippet and removed the cell where the Timer is attached after validation. In doing so I got the result that is updated on the copy of the Colab shared above. In short sharing results below. |
When I tried calculating the time taken to complete a single epoch via
Timer
, the handlers attached totrainer
beforeTimer
were executed first, and thus their time also got recorded by theTimer
too for a single epoch. Therefore the true time taken for epoch completion, provided bytrainer.state.times
, is less than whatTimer
calculated. This can be misleading. More clarification on how this actually works in the docs would be appreciated. Or theTimer
's functionality can be enhanced tostep
before all other handlers attached to an event could also be helpful. Notebook to quickly verify this here.The text was updated successfully, but these errors were encountered: