Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling YOLOX model error #976

Open
mvinci12 opened this issue Aug 28, 2024 · 0 comments
Open

Compiling YOLOX model error #976

mvinci12 opened this issue Aug 28, 2024 · 0 comments

Comments

@mvinci12
Copy link

Errors:

2024-08-28 23:10:02.000380: 248385 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/cb45b894-636b-4830-ace8-3a10faa6cd74/model.MODULE_8858416890750383945+ade7b014.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/cb45b894-636b-4830-ace8-3a10faa6cd74/model.MODULE_8858416890750383945+ade7b014.neff', '--model-type=cnn-training', '--verbose=35']: 2024-08-28T23:10:02Z [TEN404] Internal tensorizer error: TensorInitialization:Incorrect IR by <class 'neuronxcc.starfish.penguin.targets.transforms.TensorInitialization.TensorInitialization'> - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.

RuntimeError: Bad StatusOr access: INTERNAL: RunNeuronCCImpl: error condition error != 0: <class 'subprocess.CalledProcessError'>: Command '['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/cb45b894-636b-4830-ace8-3a10faa6cd74/model.MODULE_8858416890750383945+ade7b014.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/cb45b894-636b-4830-ace8-3a10faa6cd74/model.MODULE_8858416890750383945+ade7b014.neff', '--model-type=cnn-training', '--verbose=35']' returned non-zero exit status 70.

File "/home/ubuntu/aws-neuron-samples/torch-neuronx/training/neuron-adoption/yolox/core/trainer.py", line 97, in train_in_iter
xm.mark_step() # Ensure TPU operations are synchronized
│ └ <function mark_step at 0x7fcf8a8be170>
└ <module 'torch_xla.core.xla_model' from '/home/ubuntu/aws-neuron-samples/aws_neuron_venv_pytorch/lib/python3.10/site-packages...

File "/home/ubuntu/aws-neuron-samples/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch_xla/core/xla_model.py", line 969, in mark_step
torch_xla._XLAC._xla_step_marker(
│ │ └ <built-in method _xla_step_marker of PyCapsule object at 0x7fcf9a0b6d00>
│ └ <module '_XLAC' from '/home/ubuntu/aws-neuron-samples/aws_neuron_venv_pytorch/lib/python3.10/site-packages/_XLAC.cpython-310-...
└ <module 'torch_xla' from '/home/ubuntu/aws-neuron-samples/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch_xla/__in...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants