Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JOB_STATE_FAILED for cifar10_tensorflow #19

Open
nayakanuj opened this issue Apr 26, 2022 · 0 comments
Open

JOB_STATE_FAILED for cifar10_tensorflow #19

nayakanuj opened this issue Apr 26, 2022 · 0 comments

Comments

@nayakanuj
Copy link

I am unable to launch an example script. Following is the command and console output/Error.
I am running the command from PyCharm terminal. The job is launched but fails immediately with "JOB_STATE_FAILED" error.

% sudo xmanager launch ./examples/cifar10_tensorflow/launcher.py

Console output + Error (a part of it):
[+] Building 0.5s (16/16) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 694B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for gcr.io/deeplearning-platform-release/tf2-gpu.2-6:latest 0.4s
=> [ 1/11] FROM gcr.io/deeplearning-platform-release/tf2-gpu.2-6@sha256:<"a bunch of HEX digits"> 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 8.07kB 0.0s
=> CACHED [ 2/11] RUN if ! id 1000; then useradd -m -u 1000 clouduser; fi 0.0s
=> CACHED [ 3/11] RUN apt-get update && apt-get install -y git netcat 0.0s
=> CACHED [ 4/11] RUN python -m pip install --upgrade pip 0.0s
=> CACHED [ 5/11] COPY cifar10_tensorflow/requirements.txt /cifar10_tensorflow/requirements.txt 0.0s
=> CACHED [ 6/11] RUN python -m pip install -r cifar10_tensorflow/requirements.txt 0.0s
=> CACHED [ 7/11] COPY cifar10_tensorflow/ /cifar10_tensorflow 0.0s
=> CACHED [ 8/11] RUN chown -R 1000:root /cifar10_tensorflow && chmod -R 775 /cifar10_tensorflow 0.0s
=> CACHED [ 9/11] WORKDIR cifar10_tensorflow 0.0s
=> CACHED [10/11] COPY entrypoint.sh ./entrypoint.sh 0.0s
=> CACHED [11/11] RUN chown -R 1000:root ./entrypoint.sh && chmod -R 775 ./entrypoint.sh 0.0s
=> exporting to image 0.0s
=> => exporting layers
...
{"status":"Waiting","progressDetail":{},"id": ....
{"status":"Layer already exists","progressDetail":{},"id": ....
Your image URI is:
Job launched at: https://console.cloud.google.com/ai/platform/locations//training/
current state: JobState.JOB_STATE_QUEUED
current state: JobState.JOB_STATE_PENDING
current state: JobState.JOB_STATE_FAILED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant