You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During optimization of a training pipeline I observed a GPU utilization pattern that would suggest that the DALI pipeline and the training code are running sequentially rather than in parallel as I would have expected from reading the DALI documentation.
Here you can see that model is training when the GPU cuda utilization (blue) spikes. But during training the GPU decoding utilization (green) stops. It also seems like there is very low GPU decoding utilization. Is there some way to ensure the decoder is always running so the downstream model doesnt stop training?
This is a a very simple GPU video processing pipeline in Dali that decodes, resizes and then pads videos. I am using this pipeline to train a downstream model. Here are some of the parameters being used to configure this pipeline:
Thank you for reaching out. The utilization plots you showed are good place to start more thorough analysis. I recommend capturing the profile using NSIght System to learn more details. It may happen that there is a piece of CPU code in the training that stalls the GPU work, and DALI is not a bottleneck but provide the data at the peace the training can consume it.
Describe the question.
During optimization of a training pipeline I observed a GPU utilization pattern that would suggest that the DALI pipeline and the training code are running sequentially rather than in parallel as I would have expected from reading the DALI documentation.
Here you can see that model is training when the GPU cuda utilization (blue) spikes. But during training the GPU decoding utilization (green) stops. It also seems like there is very low GPU decoding utilization. Is there some way to ensure the decoder is always running so the downstream model doesnt stop training?
This is a a very simple GPU video processing pipeline in Dali that decodes, resizes and then pads videos. I am using this pipeline to train a downstream model. Here are some of the parameters being used to configure this pipeline:
And there is the python code where these are being used:
If this is expected behavior that is fine but I am trying to make sure that there isn't a flag or misconfiguration that is causing this performance.
Thanks for your help!
Check for duplicates
The text was updated successfully, but these errors were encountered: