You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Situation: the 17 Oct GZ retraining job encountered an issue -- the job preparation script failed leaving node in unusable state. Cliff rebooted the node, and that allowed the job (which was still in "active" state in training pool) to start up and run correctly. However, at end of job, there was a FileUploadAccessDenied error when the node tried to push the stdout & stderr log files up to kadeactivelearning blob storage account. See error details:
Hypotheses:
The signature provided to the training job to use for upload expired before upload could occur. Solution: check signature and increase valid duration (to allow for delays due to human-triggered node reboots).
???
The text was updated successfully, but these errors were encountered:
Another solution: prevent the job preparation task failure so everything runs as expected. Possible solution: add a sleep pause to allow node to establish network connectivity before trying to mount directories / drives.
Situation: the 17 Oct GZ retraining job encountered an issue -- the job preparation script failed leaving node in unusable state. Cliff rebooted the node, and that allowed the job (which was still in "active" state in training pool) to start up and run correctly. However, at end of job, there was a FileUploadAccessDenied error when the node tried to push the stdout & stderr log files up to kadeactivelearning blob storage account. See error details:
Hypotheses:
The text was updated successfully, but these errors were encountered: